OT: why are LAMP sites slow?

S

Steve Holden

Jack said:
[reordered Paul's email a bit]

This is the kind of answer I had in mind.


*ding*ding*ding* The biggest mistake I've made most frequently is using
a database in applications. YAGNI. Using a database at all has it's
own overhead. Using a database badly is deadly. Most sites would
benefit from ripping out the database and doing something simpler.
Refactoring a database on a live system is a giant pain in the ass,
simpler file-based approaches make incremental updates easier.

The Wikipedia example has been thrown around, I haven't looked at the
code either; except for search why would they need a database to
look up an individual WikiWord? Going to the database requires reading
an index when pickle.load(open('words/W/WikiWord')) would seem sufficient.
[...]
I don't mean LAMP is inherently slow, I just mean that a lot of
existing LAMP sites are observably slow.


A lot of these are just implementation. Going the dumb non-DB way won't
prevent you from making bad choices but if a lot of bad choices are made
simply because of the DB (my assertion) dropping the DB would avoid
some bad choices. I think Sourceforge has one table for all project's
bugs & patches. That means a never used project's bugs take up space
in the index and slow down access to the popular projects. Would a
naive file-based implementation have been just as bad? maybe.

If there is interest I'll follow up with some details on my own LAMP
software which does live reports on gigs of data and - you guessed it -
I regret it is database backed. That story also involves why I started
using Python (the prototype was in PHP).
Having said all this, I'd happily campaign for a relational backend to
MoinMoin if it would speed up the execrably slow performance we
currently see on the python.org wiki. I haven't investigated why
response times went up so much, but I have a distinct feeling it's to do
with a new release.

regards
Steve
 
P

Paul Rubin

Kartic said:
Architectural differences. Apache 1.3 spawns a new process for every
request and before you know, it brings your resources to their knees.

Oh but it doesn't spawn new processes like that, at least if it's
configured correctly. It uses "pre-forking", which means it spawns a
bunch of processes when you first start it running, and those
processes persist and serve requests (up to N simultaneously, where N
is the no. of processes). Sort of like a traditional DB connection
pool.
The CSS way is using <div> placement of the elements. Actually <div>
gives better control over placement than with HTML tables. And with
CSS, since you style the various HTML tags, you can create different
"skins" for your site too. This is definitely OT, like you said, but
if you are interested, please contact me directly. I don't pretend to
be a CSS expert but I can help you as much as I can.

I have some interest in this but I should probably just read up on it,
if it's what everyone is doing these days. Clearly I'm behind the
times about this stuff.
 
T

Tim Daneliuk

Fredrik said:
Tim Daneliuk wrote:




does 50 gigabytes per day, sustained, count as high performance in your book?

</F>

'Depends. It is certainly a high-volume application. But is it
transactional? Is it online or batch? Does it have correctness
guarantees and so forth ... You can ftp 50 gigabytes in one
day with big enough network pipes, but that does not a transactional
system make, for example ...
 
?

=?ISO-8859-1?Q?Maciej_Mr=F3z?=

Kartic said:
Paul Rubin said the following on 2/3/2005 7:20 PM:


If you are talking about Wikipedia as a prime example, I agree with you
that it is *painfully* slow.

And the reason for that I probably because of the way the language is
used (PHP) (this is a shot in the dark as I have not looked into
Mediawiki code), and compounded by probably an unoptimized database. I
don't want to start flame wars here about PHP; I use PHP to build client
sites and like it for the "easy building of dynamic sites" but the
downside is that there is no "memory"...every page is compiled each time
a request is made. I doubt if Wikipedia site uses an optimizer (like
Zend) or caching mechanisms. Optimizers and/or PHP caches make a huge
performance difference.

Mmcache (which is both optimizer and shared memory caching library for
php) can do _miracles_. One of my company servers uses Apache
1.3/php/mmcache to serve about 100 GB of dynamic content a day (it could
do more but website does not have enough visitors to stress it :) ).
There's also memcached, much more of interest to readers of this list -
iirc it has Python API.
 
M

M.E.Farmer

Jeffery said:
One caveat here -- I don't believe you can (should) nest
a <div> inside a
<span>, or for that matter, nest any block-level element
inside an inline element.

Jeffrey,
Actually I have not shown that in the example.
But for the sake of clarity, I am glad you mentioned it.
Overall I agree with your statement you *should* not nest div inside of
span, BUT...
Div *can* be nested inside a span. I just tried it in Firefox , I.E. ,
and Opera to be sure and it does work ;)
Probably not w3c compliant though.
M.E.Farmer
 
P

Paul Rubin

Maciej Mróz said:
Mmcache (which is both optimizer and shared memory caching library for
php) can do _miracles_. One of my company servers uses Apache
1.3/php/mmcache to serve about 100 GB of dynamic content a day (it
could do more but website does not have enough visitors to stress it
:) ).

Yeah, that's what Wikipedia is using, I'd forgotten its name.

http://meta.wikimedia.org/wiki/PHP_caching_and_optimization

There's also memcached, much more of interest to readers of this list
iirc it has Python API.

Oh cool, I should look at it.
 
E

EP

Tim Daneliuk said:
This has a lot to do with the latency and speed of the connecting
network. Sites like Ebay, Google, and Amazon are connected
to internet backbone nodes (for speed) and are cached throughout
the network using things like Akami (to reduce latency)...


I agree, the problem is not always the software. :) I think Akami reduces the network latency experienced; also, one may have multiple servers and load balance the traffic, and local NetCaches can speed traffic by reducing the load on the server (the most requested data is cached upstream from the server).

I understand that web queries are very clustered (though I've not seen data to confirm this): we may all be making individual queries (googling about decorators?), but the queries are not as different as we might imagine. Hardware caching helps with that.

A good "enterprise" web site will have infrastructure on it's side beyond what an economical-minded non-commercial site is likely to invest in.

Transactional (secure) processing is a different animal, I think (but I would also think it is a very small percent of the overall web traffic).


No worries, Apache may run faster when it's written in PyPy. :) I guess that statement shows I am a bit biased in thought process.
 
B

Brian Beck

Refactoring a database on a live system is a giant pain in the ass,
simpler file-based approaches make incremental updates easier.

The Wikipedia example has been thrown around, I haven't looked at the
code either; except for search why would they need a database to
look up an individual WikiWord? Going to the database requires reading
an index when pickle.load(open('words/W/WikiWord')) would seem sufficient.

I'm not so sure about this. If whatever is at, for example,
words/W/WikiWord is just the contents of the entry, sure. But what about
all the metadeta that goes along with that record? If you were to add a
new field that is mandatory for all word records, you'd have to traverse
the words directory and update each file, and that would require your
own home-spun update scripts (of questionable quality). Worse, if the
modifications are conditional.

As much as I hate working with relational databases, I think you're
forgetting the reliability even the most poorly-designed database
provides. Continuing with the words example: assuming all words would
otherwise be stored in a table, consider the process of updating the
database schema--all entries are guaranteed to conform to the new
schema. With separate files on a filesystem, there is a much higher
chance of malformed or outdated entries. Of course, this all depends on
how reliable your implementation is, but consider which aspect you'd
rather leave open to experimentation: loss of data integrity and
possibly even data itself, or speed and efficiency?
 
L

Lee Harr

As much as I hate working with relational databases, I think you're
forgetting the reliability even the most poorly-designed database
provides. Continuing with the words example: assuming all words would
otherwise be stored in a table, consider the process of updating the
database schema--all entries are guaranteed to conform to the new
schema.


Not only that, but with a well-design RDBMS you can put your
schema changes inside of a transaction and make sure everything
is right before committing.

Isn't there a saying like ... those who create file-based
databases are destined to re-create a relational database
management system poorly? ;o)
 
S

Steve Holden

Lee said:
Not only that, but with a well-design RDBMS you can put your
schema changes inside of a transaction and make sure everything
is right before committing.
Bear in mind, however, that *most* common RDBMS will treat each DDL
statement as implicitly committing, so transactional change abilities
*don't* extend to schema changes.
Isn't there a saying like ... those who create file-based
databases are destined to re-create a relational database
management system poorly? ;o)

regards
Steve
 
L

Lee Harr

Not only that, but with a well-design RDBMS you can put your
Bear in mind, however, that *most* common RDBMS will treat each DDL
statement as implicitly committing, so transactional change abilities
*don't* extend to schema changes.


I only use the free PostgreSQL which can make most schema
changes inside of transaction blocks. I believe that in 8.0
you can even change the type of a column in a transaction.

Free and easy to install (even on NT now in 8.0).

Very easy to use from python using psycopg or SQLObject.
 
L

Leif K-Brooks

Paul said:
I notice that lots of the medium-largish sites (from hobbyist BBS's to
sites like Slashdot, Wikipedia, etc.) built using this approach are
painfully slow even using seriously powerful server hardware. Yet
compared to a really large site like Ebay or Hotmail (to say nothing
of Google), the traffic levels on those sites is just chickenfeed.

To some extent, I would say it has to do with the servers being used.
Slashdot has only 10 servers [1], while Wikipedia has only 39 [2];
Hotmail, on the other hand, has around 3500 [3].

A better comparison to Hotmail is the high-traffic Web site Neopets,
which has around 200 servers [4] and uses Linux, Oracle, MySQL (for a
few parts of the site), Apache, and PHP.

[1] <http://slashdot.org/faq/tech.shtml#te050>
[2] <http://meta.wikimedia.org/wiki/Wikimedia_servers>
[3] <http://www.securityoffice.net/mssecrets/hotmail.html#_Toc491601826>
[4] Estimated by counting the name of wwwXXX.neopets.com domain names
(238). Neopets probably has at least 300 servers if you include non-Web
servers.
 
J

JanC

Jeffrey Froman schreef:
One caveat here -- I don't believe you can (should) nest a <div>
inside a <span>, or for that matter, nest any block-level element
inside an inline element.

The following nesting is valid AFAIK:

<span><object><div></div></object></span>

While this isn't:

<span><div></div></span>
 
A

Almad

M.E.Farmer said:
Paul Rubin wrote:
To emulate a table you use the div and span tag.

Sorry, but I don't think that this is the way you have to create web pages.
I think the most important think is to have web pages "semantic valid". That
means, do not use table for formatting whole page, but when you have table
data, you have to use tables. Don't you?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top