Finding the first file in a dir, fast, for message queue.

A

Alex Maccaw

I have a file based msg queue, and msgs are stored five folders down.

When a request for the next msg is received, I need to grab the first
file I can find, as fast as I can.

I've tried Dir.glob, and only selecting the first file. This however is
an awful way of doing it as it loads every file into memory, before
selecting the first.

A bit better is Find.find, which finds files incrementally. However,
this still takes about 0.005 seconds (I presume since it's also
'finding' directories').

Is there a faster way to do this?
 
R

Robert Klemme

2008/1/8 said:
I have a file based msg queue, and msgs are stored five folders down.

When a request for the next msg is received, I need to grab the first
file I can find, as fast as I can.

I've tried Dir.glob, and only selecting the first file. This however is
an awful way of doing it as it loads every file into memory, before
selecting the first.

A bit better is Find.find, which finds files incrementally. However,
this still takes about 0.005 seconds (I presume since it's also
'finding' directories').

Is there a faster way to do this?

You find 5ms when accessing the file system long? I'd say that's
pretty fast considering what you do (recursive search). I doubt you
will get much improvement as long as you always access the file system
for your search. If you know the change frequency of files then you
could store file system contents in memory and only update every n
seconds / minutes or whatever or have a background thread that
continuously updates your in memory representation.

Kind regards

robert
 
A

Alex Maccaw

You find 5ms when accessing the file system long? I'd say that's
pretty fast considering what you do (recursive search). I doubt you
will get much improvement as long as you always access the file system
for your search. If you know the change frequency of files then you
could store file system contents in memory and only update every n
seconds / minutes or whatever or have a background thread that
continuously updates your in memory representation.


Well, it's long compared to generating folder names, a guid, and writing
files. It means that I can publish about 1000 msgs per second on to my
queue, but only pull of 243 msgs per second.
 
R

Robert Klemme

2008/1/8 said:
Well, it's long compared to generating folder names, a guid, and writing
files. It means that I can publish about 1000 msgs per second on to my
queue, but only pull of 243 msgs per second.

Here's another option: find all files and put them in a queue. Only
redo search when the queue is empty. This might pay off over all.

robert
 
A

Alex Maccaw

Robert said:
Here's another option: find all files and put them in a queue. Only
redo search when the queue is empty. This might pay off over all.

robert

Yes, that's what I've done. Message polling is now the same speed as
publishing (so the overall speed is about 10000 msg per second). If you
interested, here's the queue:

http://code.google.com/p/sparrow
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,906
Latest member
SkinfixSkintag

Latest Threads

Top