The first 10 files

A

Arne Vajhøj

On 1/26/2013 6:21 PM, Peter Duniho wrote:
On Sat, 26 Jan 2013 17:06:07 -0500, Eric Sosman wrote:

On 1/26/2013 4:15 PM, Robert Klemme wrote:
On 26.01.2013 19:26, Arne Vajhøj wrote:

But I am a bit skeptical about whether a String[] with 30K elements
is really the bottleneck.

If the real bottleneck is the OS calls to get next file, then
a filter like this will not help.

Why?

Because the listFiles() method will fetch the information
for all 30K files from the O/S, will construct 30K File objects
to represent them, and will submit all 30K File objects to the
FileFilter, one by one. The FileFilter will (very quickly)
reject 29.99K of the 30K Files, but ...

Will it?

Necessarily. As far as listFiles() knows, the FileFilter
might accept the very last File object given to it. Therefore,
listFiles() cannot fail to present that very last File -- and
every other File -- for inspection.
[ SNIP ]

I'd have to agree. A simple test shows this to be the case, but your
reasoning precludes having to run such a test in the first place.

My code "gets' the first N files from listFiles(), for some definition
of "first", but it certainly doesn't only get N files from the OS.

Based on Wojtek's later post, I'd be examining the entire problem in
more detail before arriving at a decent solution. I don't think most of
the problem pertaining to offering reasonable batches of files to a Java
program for processing is something that I'd address in Java anyway.

If OP happens to be on Java 7, then I will suggest using:

java.nio.file.Files.newDirectoryStream(dir)

It is a straight forward way of getting the first N files.

And it is is as likely as the exception hack to not to read
all filenames from the OS.

import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.Iterator;

public class ListFilesWithLimit {
public static void main(String[] args) throws IOException {
Iterator<Path> dir =
Files.newDirectoryStream(Paths.get("/work")).iterator();
int n = 0;
while(dir.hasNext() && n < 10) {
System.out.println(dir.next());
}
}
}

Arne
 
E

Eric Sosman

[...]
I'm not saying it's a great solution. But it's a far cry from a conclusion
that it simply cannot be done with the Java API as it exists now.

Did somebody say that? I certainly didn't -- indeed, part of
what you snipped from my post was a pointer to a perfectly clean
and well-documented Java SE API that does exactly what's needed.
 
J

John B. Matthews

[QUOTE="Wojtek said:
Although it may be beyond your control, you should also critically
assess a design having tens of thousands of files in a single
directory.

Well of course.

The directory holds files which are uploaded by external events. If
there are a lot of events between application runs, then the number
of files can indeed reach large numbers.

Since this is happening on a server, and you cam potentially have
many hundreds of people accessing at the same time (each with there
own directory), I was hoping to be able to "stage" file processing.

The:

public boolean accept(File pathname) {
return maxFiles-- > 0;
}

in FileFilter is interesting, but the file system nevertheless still
runs through the entire directory. Maybe FileFilter needs:

public boolean abort(File pathname);

Hmm, maybe I need a timed background process to move files to "holding"
directories which will be limited to a small number of files.[/QUOTE]

If Java 7 is available, a java.nio.file.WatchService may be helpful in
detecting (subsequent) changes while running.

<http://docs.oracle.com/javase/tutorial/essential/io/notification.html>
 
R

Roedy Green

Indeed, I suppose one could throw an exception from the FileFilter accept()
method to interrupt enumeration, if that's how listFiles() is implemented.
That would avoid the need to enumerate more than the needed number of
actual files.

you could resolve that question with some System.nanotime dumps. How
long for first to show up relative to others. IIRC is builds the array
then feeds it to the Filter, but that could just have been someone
explaining how it works conceptually.


I do know that Java takes a lot longer to span a disk than C .
Building the array first means less native code needed for
multiplatform implementation.
For most applications, you need to run every file name through the
filter so it does not matter which you do first. You would save
building File objects for items not passing the Filter.
--
Roedy Green Canadian Mind Products http://mindprod.com
The first 90% of the code accounts for the first 90% of the development time.
The remaining 10% of the code accounts for the other 90% of the development
time.
~ Tom Cargill Ninety-ninety Law
 
J

Jim Janney

Peter Duniho said:
[...]
Because the listFiles() method will fetch the information
for all 30K files from the O/S, will construct 30K File objects
to represent them, and will submit all 30K File objects to the
FileFilter, one by one. The FileFilter will (very quickly)
reject 29.99K of the 30K Files, but ...

Will it?

Necessarily. As far as listFiles() knows, the FileFilter
might accept the very last File object given to it. Therefore,
listFiles() cannot fail to present that very last File -- and
every other File -- for inspection.

Except in the way I already noted, you mean.
[...]
Indeed, I suppose one could throw an exception from the FileFilter accept()
method to interrupt enumeration, if that's how listFiles() is implemented.
That would avoid the need to enumerate more than the needed number of
actual files.

It would also avoid the burden of returning anything from
listFiles() -- like, say, the array of accepted files ...

As you've already agreed, it is possible for the FileFilter implementation
to store the results itself, obviating any need for the listFiles() method
to return successfully.

If it works (which is not assured...it depends on how listFiles() is
implemented in the first place), then yes, maybe it's a bit of a kludge.
But it's an easier, more portable kludge than writing some JNI-based
component and would in fact get the job done.

Sometimes, when the library you're using doesn't provide exactly the
features you need, you wind up with a kludge. Oh well...shit happens.

I'm not saying it's a great solution. But it's a far cry from a conclusion
that it simply cannot be done with the Java API as it exists now.

It's an abuse of the notion of a filter, but yes, it can be made to
work. I stand corrected.
 
J

Jim Janney

Eric Sosman said:
[...]
I'm not saying it's a great solution. But it's a far cry from a conclusion
that it simply cannot be done with the Java API as it exists now.

Did somebody say that? I certainly didn't -- indeed, part of
what you snipped from my post was a pointer to a perfectly clean
and well-documented Java SE API that does exactly what's needed.

I said that. I was wrong.
 
J

Jim Janney

Arved Sandstrom said:
On Sat, 26 Jan 2013 17:06:07 -0500, Eric Sosman wrote:

On 1/26/2013 4:15 PM, Robert Klemme wrote:
On 26.01.2013 19:26, Arne Vajhøj wrote:

But I am a bit skeptical about whether a String[] with 30K elements
is really the bottleneck.

If the real bottleneck is the OS calls to get next file, then
a filter like this will not help.

Why?

Because the listFiles() method will fetch the information
for all 30K files from the O/S, will construct 30K File objects
to represent them, and will submit all 30K File objects to the
FileFilter, one by one. The FileFilter will (very quickly)
reject 29.99K of the 30K Files, but ...

Will it?

Necessarily. As far as listFiles() knows, the FileFilter
might accept the very last File object given to it. Therefore,
listFiles() cannot fail to present that very last File -- and
every other File -- for inspection.
[ SNIP ]

I'd have to agree. A simple test shows this to be the case, but your
reasoning precludes having to run such a test in the first place.

My code "gets' the first N files from listFiles(), for some definition
of "first", but it certainly doesn't only get N files from the OS.

Based on Wojtek's later post, I'd be examining the entire problem in
more detail before arriving at a decent solution. I don't think most
of the problem pertaining to offering reasonable batches of files to a
Java program for processing is something that I'd address in Java
anyway.

There's also the problem of starvation, since we have no guarantees
concerning the order of entries in the directory.
 
W

Wojtek

Arved Sandstrom wrote :
I'd be examining the entire problem in more detail before arriving at a
decent solution. I don't think most of the problem pertaining to offering
reasonable batches of files to a Java program for processing is something
that I'd address in Java anyway.

Events are on a per-user basis, that is to say each user has their own
event list.

The events are observed when the user logs in. Might be today or next
week.

To keep server processing reasonable I want to limit the number of
events sent back to the user at a time (10 was just a number I pulled
out of the air, obviously some tuning is required, and might even be
dynamic depending on how busy the rest of the system is).

I have no control over the number of events, how often they occur, nor
how often a user logs in to look at them. 30K might be the high end,
though I need to cover it if I get a busy event set and a lazy user.

I might even set up a DB table for each user and store each event file
as it comes in. Then use the DB to get the file names.

Still white-boarding this...
 
A

Arved Sandstrom

Arved Sandstrom wrote :

Events are on a per-user basis, that is to say each user has their own
event list.

The events are observed when the user logs in. Might be today or next week.

To keep server processing reasonable I want to limit the number of
events sent back to the user at a time (10 was just a number I pulled
out of the air, obviously some tuning is required, and might even be
dynamic depending on how busy the rest of the system is).

I have no control over the number of events, how often they occur, nor
how often a user logs in to look at them. 30K might be the high end,
though I need to cover it if I get a busy event set and a lazy user.

I might even set up a DB table for each user and store each event file
as it comes in. Then use the DB to get the file names.

Still white-boarding this...
A file is not actually an unreasonable place to keep info for one event.
You want to store that information *someplace*, and a file is not worse
than a row in a DB table or a message on a queue somewhere. It's just
that we don't want to have tens or hundreds of thousands of files in one
directory.

SIDE NOTE: don't set up a DB table for each user. :)

Why not use the NIO2 watch service, and observe the event file input
directory for file creation events? On each such event do something with
the event file. Number of options here:

1. Move it into a user-specific directory;
2. Append it to a user-specific event file;
3. Put it in a DB.
etc

I sort of like (2) myself.

What do you mean by keeping server processing reasonable?

AHS
 
R

Robert Klemme

On 1/26/2013 9:02 PM, Arved Sandstrom wrote:
If OP happens to be on Java 7, then I will suggest using:

java.nio.file.Files.newDirectoryStream(dir)

It is a straight forward way of getting the first N files.

And it is is as likely as the exception hack to not to read
all filenames from the OS.

import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.Iterator;

public class ListFilesWithLimit {
public static void main(String[] args) throws IOException {
Iterator<Path> dir =
Files.newDirectoryStream(Paths.get("/work")).iterator();
int n = 0;
while(dir.hasNext() && n < 10) {
System.out.println(dir.next());
}
}
}

For earlier Java versions we could emulate that with a second thread.

package file;

import java.io.File;
import java.io.FileFilter;
import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.BlockingQueue;
import java.util.concurrent.SynchronousQueue;
import java.util.concurrent.TimeUnit;

public final class ListFileTestThreaded2 {

private static final class CountFilterThread extends Thread
implements FileFilter {

private final File dir;
private final int maxFiles;
private final BlockingQueue<List<File>> queue;
private List<File> filesSeen = new ArrayList<File>();

public CountFilterThread(File dir, int maxFiles,
BlockingQueue<List<File>> queue) {
this.dir = dir;
this.maxFiles = maxFiles;
this.queue = queue;
}

@Override
public void run() {
try {
dir.listFiles(this);

if (filesSeen != null) {
send();
}
} catch (InterruptedException e) {
e.printStackTrace();
}
}

private void send() throws InterruptedException {
queue.put(filesSeen);
filesSeen = null;
}

@Override
public boolean accept(final File f) {
try {
if (filesSeen != null) {
filesSeen.add(f);

if (filesSeen.size() == maxFiles) {
send();
assert filesSeen == null;
}
}

return false;
} catch (InterruptedException e) {
throw new IllegalStateException(e);
}
}
}

private static final int[] LIMITS = { 10, 100, 1000, 10000,
Integer.MAX_VALUE };

public static void main(String[] args) throws InterruptedException {
for (final String s : args) {
System.out.println("Testing: " + s);
final File dir = new File(s);

if (dir.isDirectory()) {
for (final int limit : LIMITS) {
final SynchronousQueue<List<File>> queue = new
SynchronousQueue<List<File>>();
final CountFilterThread cf = new CountFilterThread(dir,
limit, queue);
cf.setDaemon(true);
final long t1 = System.nanoTime();
cf.start();
final List<File> entries = queue.take();
final long delta = System.nanoTime() - t1;
System.out.printf("It took %20dus to retrieve %20d files,
%20.5fus/file.\n",
TimeUnit.NANOSECONDS.toMicros(delta), entries.size(),
(double) TimeUnit.NANOSECONDS.toMicros(delta)
/ entries.size());
}
} else {
System.out.println("Not a directory.");
}
}

System.out.println("done");
}

}

https://gist.github.com/4648256

It's not guaranteed though that this will be faster. And it's
definitively not simpler than the straight forward approach. :)

Cheers

robert
 
A

Arne Vajhøj

Arved Sandstrom wrote :

Events are on a per-user basis, that is to say each user has their own
event list.

The events are observed when the user logs in. Might be today or next week.

To keep server processing reasonable I want to limit the number of
events sent back to the user at a time (10 was just a number I pulled
out of the air, obviously some tuning is required, and might even be
dynamic depending on how busy the rest of the system is).

I have no control over the number of events, how often they occur, nor
how often a user logs in to look at them. 30K might be the high end,
though I need to cover it if I get a busy event set and a lazy user.

I might even set up a DB table for each user and store each event file
as it comes in. Then use the DB to get the file names.

Still white-boarding this...

Java 6 no DB:

Spread files out over some subdirs.

Java 7 no DB:

Use new NIO caoabilities.

DB:

Single table for all users and just use index.

Arne
 
W

Wojtek

Arne Vajhøj wrote :
DB:

Single table for all users and just use index.

Sigh, this is what comes out when I am really tired and the fermented
grape juice takes effect.

I have to stop thinking about this stuff on weekends...
 
A

Arne Vajhøj

On 1/26/2013 9:02 PM, Arved Sandstrom wrote:
If OP happens to be on Java 7, then I will suggest using:

java.nio.file.Files.newDirectoryStream(dir)

It is a straight forward way of getting the first N files.

And it is is as likely as the exception hack to not to read
all filenames from the OS.

import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.Iterator;

public class ListFilesWithLimit {
public static void main(String[] args) throws IOException {
Iterator<Path> dir =
Files.newDirectoryStream(Paths.get("/work")).iterator();
int n = 0;
while(dir.hasNext() && n < 10) {
System.out.println(dir.next());
}
}
}

For earlier Java versions we could emulate that with a second thread.

package file;

import java.io.File;
import java.io.FileFilter;
import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.BlockingQueue;
import java.util.concurrent.SynchronousQueue;
import java.util.concurrent.TimeUnit;

public final class ListFileTestThreaded2 {

private static final class CountFilterThread extends Thread
implements FileFilter {

private final File dir;
private final int maxFiles;
private final BlockingQueue<List<File>> queue;
private List<File> filesSeen = new ArrayList<File>();

public CountFilterThread(File dir, int maxFiles,
BlockingQueue<List<File>> queue) {
this.dir = dir;
this.maxFiles = maxFiles;
this.queue = queue;
}

@Override
public void run() {
try {
dir.listFiles(this);

if (filesSeen != null) {
send();
}
} catch (InterruptedException e) {
e.printStackTrace();
}
}

private void send() throws InterruptedException {
queue.put(filesSeen);
filesSeen = null;
}

@Override
public boolean accept(final File f) {
try {
if (filesSeen != null) {
filesSeen.add(f);

if (filesSeen.size() == maxFiles) {
send();
assert filesSeen == null;
}
}

return false;
} catch (InterruptedException e) {
throw new IllegalStateException(e);
}
}
}

private static final int[] LIMITS = { 10, 100, 1000, 10000,
Integer.MAX_VALUE };

public static void main(String[] args) throws InterruptedException {
for (final String s : args) {
System.out.println("Testing: " + s);
final File dir = new File(s);

if (dir.isDirectory()) {
for (final int limit : LIMITS) {
final SynchronousQueue<List<File>> queue = new
SynchronousQueue<List<File>>();
final CountFilterThread cf = new CountFilterThread(dir,
limit, queue);
cf.setDaemon(true);
final long t1 = System.nanoTime();
cf.start();
final List<File> entries = queue.take();
final long delta = System.nanoTime() - t1;
System.out.printf("It took %20dus to retrieve %20d files,
%20.5fus/file.\n",
TimeUnit.NANOSECONDS.toMicros(delta), entries.size(),
(double) TimeUnit.NANOSECONDS.toMicros(delta)
/ entries.size());
}
} else {
System.out.println("Not a directory.");
}
}

System.out.println("done");
}

}

https://gist.github.com/4648256

It's not guaranteed though that this will be faster. And it's
definitively not simpler than the straight forward approach. :)

Is that much different from the throw exception in filter solution
except that it requires a lot more code?

Arne
 
R

Robert Klemme

On 1/26/2013 9:35 PM, Arne Vajhøj wrote:
On 1/26/2013 9:02 PM, Arved Sandstrom wrote:
If OP happens to be on Java 7, then I will suggest using:

java.nio.file.Files.newDirectoryStream(dir)

It is a straight forward way of getting the first N files.

And it is is as likely as the exception hack to not to read
all filenames from the OS.

import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.Iterator;

public class ListFilesWithLimit {
public static void main(String[] args) throws IOException {
Iterator<Path> dir =
Files.newDirectoryStream(Paths.get("/work")).iterator();
int n = 0;
while(dir.hasNext() && n < 10) {
System.out.println(dir.next());
}
}
}

For earlier Java versions we could emulate that with a second thread.

package file;

import java.io.File;
import java.io.FileFilter;
import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.BlockingQueue;
import java.util.concurrent.SynchronousQueue;
import java.util.concurrent.TimeUnit;

public final class ListFileTestThreaded2 {

private static final class CountFilterThread extends Thread
implements FileFilter {

private final File dir;
private final int maxFiles;
private final BlockingQueue<List<File>> queue;
private List<File> filesSeen = new ArrayList<File>();

public CountFilterThread(File dir, int maxFiles,
BlockingQueue<List<File>> queue) {
this.dir = dir;
this.maxFiles = maxFiles;
this.queue = queue;
}

@Override
public void run() {
try {
dir.listFiles(this);

if (filesSeen != null) {
send();
}
} catch (InterruptedException e) {
e.printStackTrace();
}
}

private void send() throws InterruptedException {
queue.put(filesSeen);
filesSeen = null;
}

@Override
public boolean accept(final File f) {
try {
if (filesSeen != null) {
filesSeen.add(f);

if (filesSeen.size() == maxFiles) {
send();
assert filesSeen == null;
}
}

return false;
} catch (InterruptedException e) {
throw new IllegalStateException(e);
}
}
}

private static final int[] LIMITS = { 10, 100, 1000, 10000,
Integer.MAX_VALUE };

public static void main(String[] args) throws InterruptedException {
for (final String s : args) {
System.out.println("Testing: " + s);
final File dir = new File(s);

if (dir.isDirectory()) {
for (final int limit : LIMITS) {
final SynchronousQueue<List<File>> queue = new
SynchronousQueue<List<File>>();
final CountFilterThread cf = new CountFilterThread(dir,
limit, queue);
cf.setDaemon(true);
final long t1 = System.nanoTime();
cf.start();
final List<File> entries = queue.take();
final long delta = System.nanoTime() - t1;
System.out.printf("It took %20dus to retrieve %20d files,
%20.5fus/file.\n",
TimeUnit.NANOSECONDS.toMicros(delta), entries.size(),
(double) TimeUnit.NANOSECONDS.toMicros(delta)
/ entries.size());
}
} else {
System.out.println("Not a directory.");
}
}

System.out.println("done");
}

}

https://gist.github.com/4648256

It's not guaranteed though that this will be faster. And it's
definitively not simpler than the straight forward approach. :)

Is that much different from the throw exception in filter solution
except that it requires a lot more code?

No.

robert
 
W

Wojtek

Knute Johnson wrote :
300003 files in the directory, almost 1.7GB of files, Windows XP, Java 7 and
it takes 16 ms to run. Somebody else should try this out on their computer
to see if it works as fast.

Ok, I'm back :)

I am using WinXP and Java 7, and the directory holds 30,001 32K files,
920MBytes

The Code:
----------------------------------------------------
package tester;

import java.io.File;
import java.io.IOException;
import java.nio.file.DirectoryStream;
import java.nio.file.FileSystems;
import java.nio.file.Files;
import java.nio.file.Path;

public class NewsGroup
{
public static void main( String[] args ) throws IOException
{
int maxFiles = 10;

System.out.println( "Large File Number Tester" );

if (args[0].equals( "nio" ))
nioRun( "C:\\apps\\test", maxFiles );

else if (args[0].equals( "io" ))
ioRun( "C:\\apps\\test", maxFiles );

else
System.out.println( "NewsGroup io|nio" );

}

private static void ioRun( String filePath, int maxFiles ) throws
IOException
{
int i = 1;

System.out.println( "IO run" );
long start = System.currentTimeMillis();

File folder = new File( filePath );
File[] listOfFiles = folder.listFiles();

for (File file : listOfFiles)
{
System.out.println( "" + i + ": " + file.getName() );

if (++i > maxFiles)
break;
}

long stop = System.currentTimeMillis();

System.out.println( "Elapsed: " + (stop - start) + " ms" );
}

private static void nioRun( String filePath, int maxFiles ) throws
IOException
{
int i = 1;

System.out.println( "NIO run" );
long start = System.currentTimeMillis();

Path dir = FileSystems.getDefault().getPath( filePath );
DirectoryStream<Path> stream = Files.newDirectoryStream( dir );

for (Path path : stream)
{
System.out.println( "" + i + ": " + path.getFileName() );

if (++i > maxFiles)
break;
}

long stop = System.currentTimeMillis();

System.out.println( "Elapsed: " + (stop - start) + " ms" );
}
}
----------------------------------------------------

A batch file to run it:
----------------------------------------------------
@echo off
java -jar NewsGroup.jar %1
----------------------------------------------------

And the results:
----------------------------------------------------
C:\apps>run io
Large File Number Tester
IO run
1: F_000000.jpg
2: F_000001.jpg
3: F_000002.jpg
4: F_000003.jpg
5: F_000004.jpg
6: F_000005.jpg
7: F_000006.jpg
8: F_000007.jpg
9: F_000008.jpg
10: F_000009.jpg
Elapsed: 156 ms

C:\apps>run io
Large File Number Tester
IO run
1: F_000000.jpg
2: F_000001.jpg
3: F_000002.jpg
4: F_000003.jpg
5: F_000004.jpg
6: F_000005.jpg
7: F_000006.jpg
8: F_000007.jpg
9: F_000008.jpg
10: F_000009.jpg
Elapsed: 140 ms

C:\apps>run io
Large File Number Tester
IO run
1: F_000000.jpg
2: F_000001.jpg
3: F_000002.jpg
4: F_000003.jpg
5: F_000004.jpg
6: F_000005.jpg
7: F_000006.jpg
8: F_000007.jpg
9: F_000008.jpg
10: F_000009.jpg
Elapsed: 156 ms

C:\apps>run nio
Large File Number Tester
NIO run
1: F_000000.jpg
2: F_000001.jpg
3: F_000002.jpg
4: F_000003.jpg
5: F_000004.jpg
6: F_000005.jpg
7: F_000006.jpg
8: F_000007.jpg
9: F_000008.jpg
10: F_000009.jpg
Elapsed: 219 ms

C:\apps>run nio
Large File Number Tester
NIO run
1: F_000000.jpg
2: F_000001.jpg
3: F_000002.jpg
4: F_000003.jpg
5: F_000004.jpg
6: F_000005.jpg
7: F_000006.jpg
8: F_000007.jpg
9: F_000008.jpg
10: F_000009.jpg
Elapsed: 31 ms

C:\apps>run nio
Large File Number Tester
NIO run
1: F_000000.jpg
2: F_000001.jpg
3: F_000002.jpg
4: F_000003.jpg
5: F_000004.jpg
6: F_000005.jpg
7: F_000006.jpg
8: F_000007.jpg
9: F_000008.jpg
10: F_000009.jpg
Elapsed: 31 ms

C:\apps>run nio
Large File Number Tester
NIO run
1: F_000000.jpg
2: F_000001.jpg
3: F_000002.jpg
4: F_000003.jpg
5: F_000004.jpg
6: F_000005.jpg
7: F_000006.jpg
8: F_000007.jpg
9: F_000008.jpg
10: F_000009.jpg
Elapsed: 78 ms

C:\apps>
----------------------------------------------------

So NIO is about 4-5 times faster than IO. The first NIO run looks like
an anomoly, might be some JRE loading happening.

All the runs produce different timings, might be a Windows caching
effect. However the NIO is consistently much faster overall.
 
R

Robert Klemme

So NIO is about 4-5 times faster than IO. The first NIO run looks like
an anomoly, might be some JRE loading happening.

All the runs produce different timings, might be a Windows caching
effect. However the NIO is consistently much faster overall.

I am not convinced that this conclusion is warranted. There are a few
factors which I believe make your conclusion questionable:

- You included class loading time in your measurement. For example,
assuming that all io functionality is implemented on top of nio it would
be logical to expect more classes to be loaded. There are a number of
use cases where it matters - but there are also use cases where it
doesn't matter (long running servers).

- Generally we are dealing with quite low timings (around 100ms) and
relatively high variations. Also the test was made on Windows and the
System.currentTimeMillis() is known to be imprecise on that platform (in
the order of tens of milliseconds).

- Your io approach does not use FileFilter which some have suggested to
be a way to avoid constructing a large result array.

- The test is an artificial situation. With all factors like JVM
involved it may be that in a realistic application things look different
to an extent that the differences you measured here do not matter any more.

Kind regards

robert
 
W

Wojtek

Robert Klemme wrote :
I am not convinced that this conclusion is warranted. There are a few
factors which I believe make your conclusion questionable:

- You included class loading time in your measurement. For example, assuming
that all io functionality is implemented on top of nio it would be logical to
expect more classes to be loaded. There are a number of use cases where it
matters - but there are also use cases where it doesn't matter (long running
servers).

The class loading would be part of a real project. The alternative is
to keep an object around which holds a link to a directory. Actually
many objects linked to many directories. Also the file list will change
as files are added and deleted.
- Generally we are dealing with quite low timings (around 100ms) and
relatively high variations. Also the test was made on Windows and the
System.currentTimeMillis() is known to be imprecise on that platform (in the
order of tens of milliseconds).

Fair enough.
- Your io approach does not use FileFilter which some have suggested to be a
way to avoid constructing a large result array.

Yes, but to filter the result still means loading each file name, then
checking to see if it matches the filter.
- The test is an artificial situation. With all factors like JVM involved it
may be that in a realistic application things look different to an extent
that the differences you measured here do not matter any more.

While the absolute times may be questionable, the relative times are
consistent.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,766
Messages
2,569,569
Members
45,043
Latest member
CannalabsCBDReview

Latest Threads

Top