how to secure documents in server

J

Jorge

Ok, why should it take longer to pull a large file out of one locatin in
a database than one location in a filesssytem?

I think the point is that retrieving such a large data chunk from a db
might momentarily impact the performance of forthcoming db operations,
think about what happens to the sql database caches.

--Jorge.
 
J

Joost Diepenmaat

Jerry Stuckle said:
But try putting 100K files in a directory on the file system and see
how much it slows things down. Whereas the database will hardly
notice any performance decrease.

That really depends on the filesystem. But yeah, most common file
systems don't like that. In any case, neither relational databases nor
normal file systems are optimized for this kind of use - especially
not if the blobs are large.

In other words, your mileage may vary. See also
http://perspectives.mvdirona.com/20...ystackEfficientStorageOfBillionsOfPhotos.aspx
 
T

The Natural Philosopher

Michael said:
.oO(The Natural Philosopher)


Just think about what steps are required in order to get a file 1) from
a DBMS, 2) from a location outside the doc root, 3) directly with a URL:

1. Storage file -> DBMS -> Socket -> Script -> Webserver -> Browser
2. File -> Script -> Webserver -> Browser
3. File -> Webserver -> Browser


The DB also has to access the disk. Additional overhead is caused by the
SQL processing itself and the transfer of the data to the requesting
script.

yes, but that is pretty insignificant with e,g. the disk speed issues
and download bandwidth, which are the same in both cases.
 
B

Bart Van der Donck

Jerry said:
You're just using the database for what it's made for - storing and
accessing data.  It's not at all disastrous - in fact, if you get enough
files in the database, performance may actually improve over that file
system's.

I would be interested to see some articles or benchmarks about this
issue. Got any ? From my experience I've actually always encountered
the opposite (MySQL and MS Access) whose performance dramatically
decreases with larger BLOBS. I'm working with many GB's of pictures
for whom I store nothing in tables (ID of the record = name of the
picture / application ties pics to IDs). I've good experiences with
this approach, even under heavy load. But I'm always interested to
learn how this strategy could be improved.
 
P

Paul Lautman

Jerry said:
Paul,

But try putting 100K files in a directory on the file system and see
how much it slows things down. Whereas the database will hardly
notice any performance decrease.

I have always found it slightly slower to get the equivalent file from the
database rather than from the file system. But as I say, it doesn't bother
me. If the application is generally better with the files in a database,
that's where they go. If the application is easier with them on disk, then I
put them there. Likewise, if something works better with static html pages I
will use them. When it comes to down to it, we have a vast range of
technologies at our disposal. I look upon my role as being good at picking
the right one for the right task. There is always a balance to be struck
between speed of processing, functionality, ease of maintenance, ...
 
J

Jerry Stuckle

Bart said:
I would be interested to see some articles or benchmarks about this
issue. Got any ? From my experience I've actually always encountered
the opposite (MySQL and MS Access) whose performance dramatically
decreases with larger BLOBS. I'm working with many GB's of pictures
for whom I store nothing in tables (ID of the record = name of the
picture / application ties pics to IDs). I've good experiences with
this approach, even under heavy load. But I'm always interested to
learn how this strategy could be improved.

Over 20 years of experience doing it, starting with DB2 on mainframes.

But don't count MS Access in there. Use a real database. MySQL
qualifies. And it has to be configured properly.

BTW - benchmarks tell exactly one thing - how a database runs UNDER
THOSE CONDITIONS. Change the conditions and benchmarks aren't valid any
more.

With that said, under live conditions, I've seen virtually no slowdown
when accessing blob data in a database. And in some cases it actually
runs faster.

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
(e-mail address removed)
==================
 
J

Jerry Stuckle

Paul said:
I have always found it slightly slower to get the equivalent file from the
database rather than from the file system. But as I say, it doesn't bother
me. If the application is generally better with the files in a database,
that's where they go. If the application is easier with them on disk, then I
put them there. Likewise, if something works better with static html pages I
will use them. When it comes to down to it, we have a vast range of
technologies at our disposal. I look upon my role as being good at picking
the right one for the right task. There is always a balance to be struck
between speed of processing, functionality, ease of maintenance, ...

Yes, but with that many files in a directory, even Linux slows down
quite a bit. It isn't made to handle that many different files.

But for a good database, you're just starting.

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
(e-mail address removed)
==================
 
J

Jerry Stuckle

Jorge said:
I think the point is that retrieving such a large data chunk from a db
might momentarily impact the performance of forthcoming db operations,
think about what happens to the sql database caches.

--Jorge.

Not at all, if the database is properly configured.

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
(e-mail address removed)
==================
 
T

The Natural Philosopher

Paul said:
I have always found it slightly slower to get the equivalent file from the
database rather than from the file system. But as I say, it doesn't bother
me. If the application is generally better with the files in a database,
that's where they go. If the application is easier with them on disk, then I
put them there. Likewise, if something works better with static html pages I
will use them. When it comes to down to it, we have a vast range of
technologies at our disposal. I look upon my role as being good at picking
the right one for the right task. There is always a balance to be struck
between speed of processing, functionality, ease of maintenance, ...
Yes. Exactly. The key is to not get religious about it ..."the RIGHT way
is to.."

Advantages of the database...

- one point backup of all data
- definitely not directly accessible via HTML
- has much better indexing and searching than a flat file system in a
directory.
- possibly simpler integration with other bits of data assciated with te
file to be served )i.e. you MIGHT want a decsription of what it is).

On the downside, its a few more machine cycles and possibly a lot more
RAM to serve it up.


HOWEVER it is perfectly possible to have separate database on even a
separate machine to do the serving, if it gets too onerous.
 
P

Paul Lautman

The said:
Advantages of the database...

- one point backup of all data
- definitely not directly accessible via HTML
- has much better indexing and searching than a flat file system in a
directory.
- possibly simpler integration with other bits of data assciated with
te file to be served )i.e. you MIGHT want a decsription of what it
is).
Also, and this is the bit I really like, when you delete the record the file
automatically goes with it.
 
P

Paul Lautman

Jerry said:
Paul,

But try putting 100K files in a directory on the file system and see
how much it slows things down. Whereas the database will hardly
notice any performance decrease.

Actually I guess I ought to qualify my timings comment. I have no proof that
it is the database that was slowing things down per-se. To serve the images
required invoking a load of script, which wasn't going to help and of course
the MySQL installation was on a shared server, so no opportunity to optimise
the settings for this task.
 
M

Michael Fesser

..oO(The Natural Philosopher)
Yes. Exactly. The key is to not get religious about it ..."the RIGHT way
is to.."

Advantages of the database...

- one point backup of all data
- definitely not directly accessible via HTML
- has much better indexing and searching than a flat file system in a
directory.
- possibly simpler integration with other bits of data assciated with te
file to be served )i.e. you MIGHT want a decsription of what it is).

On the downside, its a few more machine cycles and possibly a lot more
RAM to serve it up.

Some more pros and cons:

http://groups.google.com/group/alt.php.sql/msg/c0e4dd4f90eafa84

Micha
 
J

Jorge

Yes. Exactly. The key is to not get religious about it ..."the RIGHT way
is to.."

In fact, a filesystem is a ~DBMS that handles just one type of data
(files). But the amount of metadata that a filesystem (easily) keeps/
provides about its data (the files) is limited, while there's no limit
to the amount of metadata that can be (easily) saved/retrieved in a
DBMS. Both are (most likely) equally well optimized to do their jobs
efficiently. The APIs to get to the data are completely different. One
is pretty familiar and the other is not so much. I love the idea of
single file backups (as in a DBMS). OTOH, the filesystem approach
suits better for incremental backups.

--Jorge.
 
B

Bart Van der Donck

Jerry said:
[...]
But don't count MS Access in there.  Use a real database.  MySQL
qualifies.  And it has to be configured properly.

Not the real communism ? [*] I partly agree for MS Access [**], but I
have reasons to believe that my MySQL databases are set up properly.
This is not a thing I do myself, but sysadmins in one of the giant
datacenters who stick to one config for the entire park.
BTW - benchmarks tell exactly one thing - how a database runs UNDER
THOSE CONDITIONS.  Change the conditions and benchmarks aren't valid any
more.

With that said, under live conditions, I've seen virtually no slowdown
when accessing blob data in a database.  And in some cases it actually
runs faster.

I think the question is how BLOBs are handled. My situation is a
browser-based application that consists of many read actions (public
+intranet) and few update/delete actions (admin). Now suppose:

(1) Read actions without BLOB:
- Application does not load any BLOB data from database.
- Application uses a var holding the system-path (usr/my/path/to/
pics/), adds the ID to it, adds .jpg to it, tests if file exists (-e).
- If yes, use URL-path in stead of system-path and output inside an
<IMG> to screen.
- No binary data has to be handled; the major memory use here (if any)
is the -e check for file existance. But even this could be skipped
with a workaround.

(2) Read actions with BLOB:
- Load BLOB from column (already a memory-intensive task of its own).
- Store in some folder (id.).
- Output with <img>.

(3) Update & delete actions without BLOB:
- Update/delete instructions stay out of DB, affects file system only.

(4) Update & delete actions with BLOB:
- Update/delete instructions stay out of file system, affects DB only

It is my experience that (1) has huge memory benefits compared to
(2).

The difference between (3) and (4) is not so clear; especially because
MySQL probably optimizes this processus. I think in practice you would
see that (3) is faster for environment A, and (4) for environment B;
but never with real considerable differences.

And (1) and (2) are much more important since they count for 99.x% of
the queries in my case.

[*] -"Communism is great." -"But look how things went in the USSR."
-"That was not the real communism."
[**] Many tendencies in MS Access are a good thermometer for general
database issues; MS Access is just the first that fails :)
 
T

The Natural Philosopher

Bart said:
Jerry said:
[...]
But don't count MS Access in there. Use a real database. MySQL
qualifies. And it has to be configured properly.

Not the real communism ? [*] I partly agree for MS Access [**], but I
have reasons to believe that my MySQL databases are set up properly.
This is not a thing I do myself, but sysadmins in one of the giant
datacenters who stick to one config for the entire park.
BTW - benchmarks tell exactly one thing - how a database runs UNDER
THOSE CONDITIONS. Change the conditions and benchmarks aren't valid any
more.

With that said, under live conditions, I've seen virtually no slowdown
when accessing blob data in a database. And in some cases it actually
runs faster.

I think the question is how BLOBs are handled. My situation is a
browser-based application that consists of many read actions (public
+intranet) and few update/delete actions (admin). Now suppose:

(1) Read actions without BLOB:
- Application does not load any BLOB data from database.
- Application uses a var holding the system-path (usr/my/path/to/
pics/), adds the ID to it, adds .jpg to it, tests if file exists (-e).
- If yes, use URL-path in stead of system-path and output inside an
<IMG> to screen.
- No binary data has to be handled; the major memory use here (if any)
is the -e check for file existance. But even this could be skipped
with a workaround.

(2) Read actions with BLOB:
- Load BLOB from column (already a memory-intensive task of its own).
- Store in some folder (id.).

Unnecessary: Just..
- Output with <img>.

...pointing to a second php script that loads the BLOB and spits it out.

(3) Update & delete actions without BLOB:
- Update/delete instructions stay out of DB, affects file system only.

(4) Update & delete actions with BLOB:
- Update/delete instructions stay out of file system, affects DB only

It is my experience that (1) has huge memory benefits compared to
(2).

Well the way you have it, it duplicates the file in its entirety, which
is inefficient.

The way I do it, it streams off the database via the unix socket into
PHP memory space, and is outputted from there via the web server to the
network.

VERY little extra PHP or CPU activity is required, but I grant you its
probably held in PHP and SQL type memory areas as well as disk cache
memory. Its probably NOT held i e.g.apache memory though..apache or
whatever will read the stdout of the CGI script that spits it, and juts
pass the bytes...and memory is cheap. Cheaper than CPU anyway.



Reading a record has to be something a database is highly optimised for.
 
B

Bart Van der Donck

The said:
The way I do it, it streams off the database via the unix socket into
PHP memory space, and is outputted from there via the web server to the
network.

VERY little extra PHP  or CPU activity is required, but I grant you its
probably held in PHP and SQL type memory areas as well as disk cache
memory. Its probably NOT held i e.g.apache memory though..apache or
whatever will read the  stdout of the CGI script that spits it, and juts
pass the bytes...and memory is cheap. Cheaper than CPU anyway.

All I do is this:

SELECT id FROM table;
print "<img src=url/to/$id.jpg>";

Compared to your way:
- Simpler
- No need to start new php scripts to output raw binary stream for
every image
- No sockets
- No need to read heavy binary BLOB from DB
- No chance for possible cache attacks in MySQL, PHP, filesystem or
Apache

I don't want to sound religious, but I think my way is much better.
 
A

AlmostBob

The said:
The way I do it, it streams off the database via the unix socket into
PHP memory space, and is outputted from there via the web server to the
network.

VERY little extra PHP or CPU activity is required, but I grant you its
probably held in PHP and SQL type memory areas as well as disk cache
memory. Its probably NOT held i e.g.apache memory though..apache or
whatever will read the stdout of the CGI script that spits it, and juts
pass the bytes...and memory is cheap. Cheaper than CPU anyway.

All I do is this:

SELECT id FROM table;
print "<img src=url/to/$id.jpg>";

Compared to your way:
- Simpler
- No need to start new php scripts to output raw binary stream for
every image
- No sockets
- No need to read heavy binary BLOB from DB
- No chance for possible cache attacks in MySQL, PHP, filesystem or
Apache

I don't want to sound religious, but I think my way is much better.

--
Bart


But BArt
View source
shows the true path to your image, not good
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,772
Messages
2,569,591
Members
45,100
Latest member
MelodeeFaj
Top