O.T. optimising file placement

R

Roedy Green

Let's say I had an SSD. see http://mindprod.com/bgloss/ssd.html Lets
say I had a list of my most active files, some of which were data
files and some were windows system files.

Is there a way to move these files to SSD in a way that Java, the OS,
Windows etc act as if it did not notice they had moved? Can something
be done with those symbolic links? Is there a disk driver that uses
the SSD transparently as a cache for the most active files or
clusters?
 
D

Daniel Pitts

Let's say I had an SSD. see http://mindprod.com/bgloss/ssd.html Lets
say I had a list of my most active files, some of which were data
files and some were windows system files.

Is there a way to move these files to SSD in a way that Java, the OS,
Windows etc act as if it did not notice they had moved? Can something
be done with those symbolic links? Is there a disk driver that uses
the SSD transparently as a cache for the most active files or
clusters?

I seem to recall hearing about a way to tell windows to use SSD as a
kind of smart cache, but I haven't played around with it.
 
J

Jeff Higgins

Let's say I had an SSD. see http://mindprod.com/bgloss/ssd.html Lets
say I had a list of my most active files, some of which were data
files and some were windows system files.

Is there a way to move these files to SSD in a way that Java, the OS,
Windows etc act as if it did not notice they had moved? Can something
be done with those symbolic links? Is there a disk driver that uses
the SSD transparently as a cache for the most active files or
clusters?
<http://en.wikipedia.org/wiki/ReadyBoost>
<http://en.wikipedia.org/wiki/Smart_Response_Technology>
<http://www.highpoint-tech.com/USA_new/series_RC3240X8.htm>
 
R

Roedy Green


Back before the Internet, I was pushing for what I call "Marthaing"
drives. We might get them any year now. The idea is a microprocessor
inside the hard disk in the background keeps reordering the tracks so
that the most frequently used ones are near the outer rim.
It would have a delayed write cache built in. Today it would be
easier to implement since you could use a non-volatile fast SSD to
store the map of logical to physical track numbers and a fat delayed
write. In the background the disk moves infrequently used tracks from
the outer rim toward the center in order to free up slots for writing.

People are willing to pay double or triple for 7200 RPM to 10,000 RPM.
With that sort of budget, used instead for internal cleverness, RAM
and SSD you might be able to get considerably better performance.
 
J

Jeff Higgins

Back before the Internet, I was pushing for what I call "Marthaing"
drives. We might get them any year now.
Who is Martha? Back before the Internet I was advocating for squeezable
catsup bottles. We have'em, but I haven't got a dime for'em. :(
 
L

Lew

Back before the Internet, I was pushing for what I call "Marthaing"
drives. We might get them any year now. The idea is a microprocessor
inside the hard disk in the background keeps reordering the tracks so
that the most frequently used ones are near the outer rim.

That won't help much.
It would have a delayed write cache built in. Today it would be
easier to implement since you could use a non-volatile fast SSD to
store the map of logical to physical track numbers and a fat delayed
write. In the background the disk moves infrequently used tracks from
the outer rim toward the center in order to free up slots for writing.

People are willing to pay double or triple for 7200 RPM to 10,000 RPM.
With that sort of budget, used instead for internal cleverness, RAM
and SSD you might be able to get considerably better performance.

Modern hard drives, pretty much all of them, have a buffer and microprocessor
as part of the hardware. We're not going to get any "Marthaing" as you
describe it (wherever the heck /that/ term came from) because what they're
already doing is already so effective.

What they mostly do is collect read and write requests and combine them in
elevator-seek order, along with full-track readahead. This optimizes disk
access for single sweeps of the drive heads. The on-drive buffer also holds
enough data for most reads and writes, overtaking any advantage that any
(perforce extremely slow) physical re-ordering of the tracks could accomplish.
 
J

Jeff Higgins

I noticed that the tone is not as academic as Wikipedia.
Yep ;)
"Tomato ketchup is a pseudoplastic — or "shear thinning" substance —
which can make it difficult to pour from a glass bottle."
 
D

Daniel Pitts

Yep ;)
"Tomato ketchup is a pseudoplastic — or "shear thinning" substance —
which can make it difficult to pour from a glass bottle."
Edible Non Newtonian fluids FTW
 
M

Martin Gregorie

Modern hard drives, pretty much all of them, have a buffer and
microprocessor as part of the hardware. We're not going to get any
"Marthaing" as you describe it (wherever the heck /that/ term came from)
because what they're already doing is already so effective.

What they mostly do is collect read and write requests and combine them
in elevator-seek order, along with full-track readahead. This optimizes
disk access for single sweeps of the drive heads.
Agreed, and a mainframe OS I was using in the early '70s (ICL's George 3)
was doing it back then and very effective it is too for speeding up disk
access. Back in the day it pushed the speed of the 2800 rpm, 60 MB
washing-machine sized disk drives up from around 8 accesses/sec to
something like 20-30 per sec.

However, its ineffective unless there are many active processes
simultaneously requesting disk i/o. If all the requests come from one
single threaded process then it can't optimize head movement because
there's never more than one pending request at a time. I know this is
reduction ad absurdam, but it does make the point that a small active
process population is unlikely to be optimised as well as a large one.
This is relevant today for allmost all single-user workstations
regardless of whether they are running Windows, Linux or OS X. Since the
majority of applications run on these machines are single threaded, about
the only time you have more than one process accessing the disk is when
the user is hammering away at a task, be it wordprocessing, spread-sheet,
browser or IDE and the mail reader, sitting in the background, finds some
mail waiting.
The on-drive buffer
also holds enough data for most reads and writes, overtaking any
advantage that any (perforce extremely slow) physical re-ordering of the
tracks could accomplish.
Yep, the on-drive buffer will almost always be capable of holding several
physical tracks and, in addition, on a *NIX system anyway, all RAM not
occupied by running processes and their data will contain disk buffers.
 
A

Arne Vajhøj

Agreed, and a mainframe OS I was using in the early '70s (ICL's George 3)
was doing it back then and very effective it is too for speeding up disk
access. Back in the day it pushed the speed of the 2800 rpm, 60 MB
washing-machine sized disk drives up from around 8 accesses/sec to
something like 20-30 per sec.

However, its ineffective unless there are many active processes
simultaneously requesting disk i/o. If all the requests come from one
single threaded process then it can't optimize head movement because
there's never more than one pending request at a time. I know this is
reduction ad absurdam, but it does make the point that a small active
process population is unlikely to be optimised as well as a large one.
This is relevant today for allmost all single-user workstations
regardless of whether they are running Windows, Linux or OS X. Since the
majority of applications run on these machines are single threaded, about
the only time you have more than one process accessing the disk is when
the user is hammering away at a task, be it wordprocessing, spread-sheet,
browser or IDE and the mail reader, sitting in the background, finds some
mail waiting.

Most OS'es support async IO.

Arne
 
G

Gene Wirchenko

Yep ;)
"Tomato ketchup is a pseudoplastic — or "shear thinning" substance —
which can make it difficult to pour from a glass bottle."

"Would you like fries with that?"

Sincerely,

Gene Wirchenko
 
M

Martin Gregorie

Most OS'es support async IO.
Yes, I know, but its not relevant to a single-threaded process since its
logic generally requires it to wait for a read or write to complete
before it continues[1]. Hence my comment that this prevents head movement
being optimized unless a lot of processes are active because there's only
one outstanding IOP per process.

[1] unless you're deliberately doing async i/o using poll() or
select() (in C) or nio (in Java), in which case the process is often
best regarded as a half-way house between single and multi-threaded
logic.
 
M

Martin Gregorie

Most OS'es support async IO.
Yes, I know, but its not relevant to a single-threaded process since
its logic generally requires it to wait for a read or write to complete
before it continues[1]. Hence my comment that this prevents head
movement being optimized unless a lot of processes are active because
there's only one outstanding IOP per process.

[1] unless you're deliberately doing async i/o using poll() or
select() (in C) or nio (in Java), in which case the process is
often best regarded as a half-way house between single and
multi-threaded logic.
There are some exceptions to this. For example, if you are reading a
file sequentially, the OS may prefetch blocks you have not yet
requested, and have multiple reads outstanding as a result.
Fair point, and I've seen blinding speed from reads where the disk
drivers used track reads, but it still doesn't affect my point that
there's still only one I/O request in the queue per active single
threaded process. Head movement optimisation is simply sidestepped in
this case.
Depending on the OS and how the IO is being handled, a write may appear
to be complete from the program's point of view once the data has been
copied to a kernel buffer. The OS may be writing out modified blocks,
including swap space blocks, at any time.
Again agreed: its fair to regard a write as complete from the program's
POV as soon as it can reread the block/record - something that many
indexed sequential access schemes need to do to re-establish a 'current
record' pointer.
 
M

Martin Gregorie

Sounds like disk optimizations would help that system.
Probably not - they are all cron jobs and hence get run sequentially.
In your particular case you have no need of optimization of your disk
processes. You don't mention it but by omission I will grant you that
virtual memory on your system does not seriously contend for disk
either.
Well spotted. My type of load almost never swaps. That was the case with
the old 512 MB RAM box and is double true with its replacement (4 GB
RAM), but that still doesn't stop me setting swap space at twice RAM.

In fact the only program I have that does use gobs on RAM is a JavaMail +
Postgres app and I'm not sure if its a problem due to JavaMail's queueing
or if I've got overly long lived Object instances. Tracking this down in
on my to-do list. All I know at present is that the same program using
the same JVM uses gobs more RAM on the new machine (which is 6 times
faster as well as having 8x more RAM), so it might simply be a case of
persuading the GC to run more often.
But a typical consumer scenario is to listen to a stream while
surfing the web on Windows with several chat windows open, causing
multiple disk IO ops on a constant basis of themselves and also putting
pressure on virtual memory. Even such a single-user system can benefit
from elevator seeking and on-disk buffers.
I'm not saying head movement optimisation is a bad thing, just that it
can be difficult to get enough queued requests for it to work without a
large population of active processes that all do a lot of disk accesses.

You may well be right about the typical consumer setup: I lack any
experience that: all I understand is the pattern that my own use pattern
generates. However, I would point out that streamed music or video may
never touch the disk (though of course a torrent will). The amount of
disk i/o due to chat/IM/Twitter/web browsers may be less that we'd expect
because its (a) very bursty and (b) disk i/o time is vastly outweighed by
human reading and typing time.
Consider also that burstiness of demand does not argue against the need
for optimization, really. During bursts the optimization helps, and a
user might complain if their disks got weird once an hour.
Sure, but the user's activity scan and resulting interaction with one
program at a time, which may well be single threaded, for a few minutes
before switching to another. This tends to produce widely separated
bursts of i/o from one or two processes.
Regardless, if you don't need optimization why worry? It's like the Pope
comparing brands of condoms.
Like it!
Again, we don't excoriate the value of optimizations by citing examples
where optimization isn't needed. We evaluate optimizations by how useful
they are when they are needed.
I wasn't intending to do that, having seen just how well head scheduling
works. I merely intended to point out that there are corner cases where
such algorithms don't help - but are not a hindrance either.
 
A

Arne Vajhøj

You mean irrelevant.

First of all, single-user does not mean single-process. All these
systems are multiple-process with multiple processes at all times.

And multithreaded.

My Win7 has right now 105 processes with 1858 threads!
majority of applications run on these machines are single threaded, about

Show us some data. Please don't throw around terms like "the majorit of
applications [that] run on these machines" unless you have data.

Only 6 out of 105 is singlethreaded here.

Arne
 
A

Arne Vajhøj

Most OS'es support async IO.
Yes, I know, but its not relevant to a single-threaded process since its
logic generally requires it to wait for a read or write to complete
before it continues[1]. Hence my comment that this prevents head movement
being optimized unless a lot of processes are active because there's only
one outstanding IOP per process.

[1] unless you're deliberately doing async i/o using poll() or
select() (in C) or nio (in Java), in which case the process is often
best regarded as a half-way house between single and multi-threaded
logic.

I am talking about deliberate not accidental async IO.

And you maybe consider it half singlethreaded half multithreaded,
but when there is only one thread it is usually just called
single threaded.

Arne
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top