Real World Scalability and Ruby - Top 20

J

Joseph

Folks,

After the long post regarding Joe's now infamous entry about Ruby, I
wondered, what have the really successful, scalable, big, websites /
web applications out there have used.

So I turned mostly to two sources, which while not perfect, are good
enough.

For popularity or raw scalability I used Alexa's ranking here:
http://www.alexa.com/site/ds/top_sites?ts_mode=global&lang=none

And for webserver I either used Netcraft:
http://news.netcraft.com/

or actually tried to figure out what the site used by looking at the
code or what the page used.

This is the list of the top 20 sites and what they are apparently
using, these days many of these sites hide what goes on inside, but
still we can guess, and yes I have given it my best guess in some
cases.

Here is the table, feel free to add more information to it since there
are still many gaps that I have identified with a question mark. The
first line mentions the site name and web server, the second line the
language or framework behind it.

1 Yahoo FreeBSD
PERL, PHP, Proprietary, C?

2 MSN Windows Server 2000/2003, Some Apache
ASP, ASP.NET, DLLs?

3 Google Linux based or unknown servers, probably modified FreeBSD or
Apache.
Python, Perl, PHP?, C, Proprietary, Java

4 Baidu.com Linux based unknown.
?

5. Qq.com Linux based unknown and Windows 2003.
?

6. MySpace Windows 2003 / 2000 some Linux unknowns too.
Coldfusion

7. sina.com.cn FreeBSD, Solaris 8, Linux based unknowns,
?

8. Yahoo Japan Like Yahoo at 1.

9. 163.com China FreeBSD, Linux based unknowns,
?

10 Live.com Windows 2003, Linux unknown servers
ASP, ASP.NET, DLLs?

11 eBay.com Windows 2000/2003
PERL?, C?, DLLs, Proprietary, more?

12. Sohu.com China Linux unknown servers
?

13. YouTube.com Linux unknown servers
?

14. Yahoo China Like 1

15. Microsoft Windows 2003 / 2000
ASP.net, ASP, DLLs?

16. Wikipedia FreeBSD, Linux unknown servers
PHP, PERL?

17. Amazon.com FreeBSD, Linux unknown servers, Solaris 8, Netware
PERL, Proprietary, more?

18. Orkut.com Linux unknown server
PHP?, PERL?

19. Blogger FreeBSD, Linux unknown servers
PHP?, PERL?

20. Google UK Like Google

INTERESTING FACTS

* Not a single significant "safe" Java J2EE in the top 20.
* Many proprietary variations with FreeBSD or Linux as the only common
ground
* Some .NET and Windows 2003 are indeed listed
* Arguably the biggest web application is MySpace which is based on
Coldfusion! Certainly not a "safe" choice by a long shot.

CONCLUSIONS

* Choosing one Framework or language over another seems to be mostly
irrelevant as long as you stick to the underlying technology: FreeBSD,
Linux Based server or Windows 2003 which appear consistently in the top
web sites again and again. Also although is not mentioned anywhere,
Oracle, MS SQL Server and MySQL are high up there in these rankings
too.
* Java and J2EE is by far absent from this list, this should tell us
all something.
* .Net is very present on the list, MS obviously is doing something
right. The progress Ruby is doing with Windows is encouraging.
* Choosing the best tools for the job can give you a big payout.
MySpace is Coldfusion based, this is risky, but gives you the ability
to write database web applications fast... and it has worked well. I
would say the risk was worth it.
* Not a single major Ruby or Ruby on Rails app cuts to this list yet,
but I see no reason why this would not happen eventually.

Food for thought fellows, any input on how would Ruby would ever get
there, or who runs what, would be appreciated.

Jose L. Hurtado
Web Developer
Toronto, Canada
 
C

Charles O Nutter

Folks,

After the long post regarding Joe's now infamous entry about Ruby, I
wondered, what have the really successful, scalable, big, websites /
web applications out there have used.

It seems like you're just guessing at these. I don't know all of them,
but I know for certain that eBay is running a crapload of Java in
their backend. In fact, they used to be a solid ASP site (pre-.NET)
but switched to Java because the ASP stuff scaled horribly. IBM made a
big media event out of that a few years back. I mean really, the site
has Sun/Java branding right at the top...so it's probably safe to
assume Java's involved.

It's also extremely subjective to "look at what the site uses" because
I know for a fact many large Java-based sites use URLs showing
something other than ".jsp", for reasons that are perhaps obvious. And
there's another large chunk of sites that use PHP or .NET or what have
you for web-facing stuff while the vast majority of their apps are
actually backed by large Java clusters behind some variety of service
layer.

It would probably be better to leave the guesses off the list
completely and not try to draw any conclusions at all. Unless you
really know what these sites are using (a difficult prospect at best)
no conclusions are possible.
 
C

Chad Perrin

Here is the table, feel free to add more information to it since there
are still many gaps that I have identified with a question mark. The
first line mentions the site name and web server, the second line the
language or framework behind it.

1 Yahoo FreeBSD
PERL, PHP, Proprietary, C?

Also Python and Common Lisp (though the Lisp codebase is not growing at
this point -- it's "legacy code" that is indispensable as long as they
keep their RTML templating system).

2 MSN Windows Server 2000/2003, Some Apache
ASP, ASP.NET, DLLs?

I believe they're still using some FreeBSD systems at Hotmail, and all
of Windows is behind free unix firewalls through a proxy service.

3 Google Linux based or unknown servers, probably modified FreeBSD or
Apache.
Python, Perl, PHP?, C, Proprietary, Java

4 Baidu.com Linux based unknown.
?

5. Qq.com Linux based unknown and Windows 2003.
?

6. MySpace Windows 2003 / 2000 some Linux unknowns too.
Coldfusion

Migrating to BlueDragon.NET, which uses .NET as the back end for
ColdFusion, last I checked.

7. sina.com.cn FreeBSD, Solaris 8, Linux based unknowns,
?

8. Yahoo Japan Like Yahoo at 1.

9. 163.com China FreeBSD, Linux based unknowns,
?

10 Live.com Windows 2003, Linux unknown servers
ASP, ASP.NET, DLLs?

11 eBay.com Windows 2000/2003
PERL?, C?, DLLs, Proprietary, more?

12. Sohu.com China Linux unknown servers
?

13. YouTube.com Linux unknown servers
?

14. Yahoo China Like 1

15. Microsoft Windows 2003 / 2000
ASP.net, ASP, DLLs?

16. Wikipedia FreeBSD, Linux unknown servers
PHP, PERL?

The Wikimedia Foundation (Wikipedia, Wikinews, et cetera) has to my
knowledge only ever had a grand total of one FreeBSD server, and it
wasn't really used in production. The servers are primarily running on
Fedora Core 3-5, with a couple of old Red Hat Linux and pre-Novell SuSE
Linux servers (unless those have been upgraded since I stopped working
there). The MediaWiki software is all PHP. MySQL is used for
databases. Thus, it's classic LAMP platform. There are some Perl and
Python scripts running about for various administrative tasks, but they
don't represent any kind of measurable percentage of traffic load.

There are a lot of squid proxies used for caching to serve pages faster.
There's some rudimentary load balancing (last I checked) that's handled
at least in part by in-house scripting.

The Wikimedia Foundation uses zero .NET or Java, in case you were
wondering.

17. Amazon.com FreeBSD, Linux unknown servers, Solaris 8, Netware
PERL, Proprietary, more?

18. Orkut.com Linux unknown server
PHP?, PERL?

19. Blogger FreeBSD, Linux unknown servers
PHP?, PERL?

20. Google UK Like Google

INTERESTING FACTS

* Not a single significant "safe" Java J2EE in the top 20.
* Many proprietary variations with FreeBSD or Linux as the only common
ground
* Some .NET and Windows 2003 are indeed listed
* Arguably the biggest web application is MySpace which is based on
Coldfusion! Certainly not a "safe" choice by a long shot.

While MySpace is (again, based on what I've last heard) migrating to a
NET foundation for its ColdFusion, it got to its current prominence
entirely on a ColdFusion 5 back-end, as far as I'm aware. Having never
been employed by MySpace, I of course cannot be as sure of this as I am
about Wikimedia Foundation information.

Food for thought fellows, any input on how would Ruby would ever get
there, or who runs what, would be appreciated.

It's also worth noting that Slashdot is Perl on Linux, I think via
Apache and MySQL (but don't quote me on that unless I'm right).
 
C

Chad Perrin

I believe they're still using some FreeBSD systems at Hotmail, and all
of Windows is behind free unix firewalls through a proxy service.

Arrrgh, typo. That should read "all of Microsoft". Sorry.
 
J

Joseph

Charles,

Good to know about eBay, as I said I am guessing, and getting input was
precisely what I wanted... so it seems at least one Java is out there,
are there more?

Thanks Charles!

Jose L. Hurtado
Web Developer
Toronto, Canada
 
E

Eero Saynatkari

Jose said:
Folks,

<snip />

Many people seem to forget that the word 'scalability'
implies bidirectionality. I assert that Ruby scales
better than Java for most things:

Small----------------------------------Large
Java <-------->
Ruby <------------------------------->

Have a nice day.
 
S

Seth Thomas Rasmussen

Everybody talks about this "real" world as if it is different from the
one we all experience day to day.

I hope to one day experience this world, because I'm sure I'll be
completely ready to rule them all with my vast knowledge of what works
in the "real" world.

Here's to talking heads..

P.S. Scalability. And enterprise, don't forget enterprise. My latest
benchmarks show that under certain conditions, some numbers are
produced.
 
A

A. S. Bradbury

The Wikimedia Foundation uses zero .NET or Java, in case you were
wondering.

This is really getting off topic, but Wikimedia do use Lucene for at least the
english search (compiled with GCJ however).

Alex
 
C

Chad Perrin

This is really getting off topic, but Wikimedia do use Lucene for at least the
english search (compiled with GCJ however).

Hmm. I'd forgotten about that.

It's kinda like splitting hairs, though -- which is why I didn't
remember it was technically written in Java.
 
D

David Vallner

Joseph said:
[snip pulling arguments out of your pinky finger]

You forget intranets. Internal company webapps have to serve humongous
amounts of traffic on not really lavish hardware. Listing the Fortune 20
of websites which indeed CAN afford to "just throw more servers at it"
tells us precisely nothing at all about technology scalability.

Windows probably sees more use for company backends than you can imagine
on accounts of being easy to set up and work with up to a certain scale
when you really need automation instead of an underpaid student support
gimp.

Also, your method of research is laughable.
* Choosing one Framework or language over another seems to be mostly
irrelevant as long as you stick to the underlying technology: FreeBSD,
Linux Based server or Windows 2003 which appear consistently in the top
web sites again and again.

Oh yes. Only the three most mainstream server OSs appear in that list.
Surprise.
* Java and J2EE is by far absent from this list, this should tell us
all something.

Or not, since the list is worthless data.
* .Net is very present on the list, MS obviously is doing something
right. The progress Ruby is doing with Windows is encouraging.

Good marketing, IIS comes with Windows, more straightforward than Java
to do MVC and deployment with. Makes it an easier first choice nowadays.
* Choosing the best tools for the job can give you a big payout.
MySpace is Coldfusion based, this is risky, but gives you the ability
to write database web applications fast... and it has worked well. I
would say the risk was worth it.

Worstofmyspace.com begs to differ. I completely ignore the very
existence of MySpace except from tidbits on the aforementioned site, but
if it's remotely to be trusted, it's far from stable and reliable.

David Vallner
 
J

Joseph

David Vallner said:
You forget intranets. Internal company webapps have to serve humongous
amounts of traffic on not really lavish hardware. Listing the Fortune 20
of websites which indeed CAN afford to "just throw more servers at it"
tells us precisely nothing at all about technology scalability.

Aha... I would argue this is not true. Having worked for two large
corporations in the past with over 200,000 employees and huge
Intranets, I can assure you some of the very worst delays, and
website/web application design is behind closed doors of those
Intranets. And I have yet to find one internal webapp that reaches the
scalability of a public app... but then again, some government intranet
apps might truly be huge and similar to a major website traffic.


David said
Also, your method of research is laughable.

OK... this is a very aggressive way of making a point isn't it David?
I am tempted to reply, but I will ignore the attack and suggest you
give us a better research method in under 1 hour that works and does
not make me us laugh, and then please do email us the results ; )

Now, don't come back and tell us you need a few thousand dollars, a
month and a research firm to find out anything about this subject, when
something can already be known in under one hour, and that was my
point, to try to shed some light into this subject, and to receive more
information from others who may probably know more than I do. I
believe the open source movement calls this collaboration, and it
works.

Best Regards,

Jose Hurtado
Web Developer
Toronto, Canada
 
J

Joseph

Friends,

As Tim Bray suggested I've made my best to drop the guesses on the
list, and show only information I know is either true or reported by
some credible source. When no information is there, I just left a
question mark.

I have also updated the list with the information Chad Perrin, Charles
Nutter and Tim Bray added to it. This is the list so far, again open
for improvement:

1 Yahoo FreeBSD
PERL, PHP, Proprietary
"Also Python and Common Lisp" Chad Perrin


2 MSN Windows Server 2000/2003, Some FreeBSD
ASP, ASP.NET
"I believe they're still using some FreeBSD systems at Hotmail, and all
of Windows is behind free unix firewalls through a proxy service." Chad
Perrin

3 Google. Linux based or unknown servers
Python, C, Proprietary, Java

4 Baidu.com Linux based unknown.
?

5. Qq.com Linux based unknown and Windows 2003.
?

6. MySpace Windows 2003 / 2000 some Linux unknowns too.
Coldfusion
"Migrating to BlueDragon.NET, which uses .NET as the back end for
ColdFusion... currently... on a ColdFusion 5 back-end" Chad Perrin

7. sina.com.cn FreeBSD, Solaris 8, Linux based unknowns,
?

8. Yahoo Japan Like Yahoo at 1.

9. 163.com China FreeBSD and some Linux based unknowns,
?

10 Live.com Windows 2003, Linux unknown servers
ASP.NET

11 eBay.com Windows 2000/2003
PERL, Proprietary, Java J2EEE

"eBay is running a crapload of Java... they used to be a solid ASP site
(pre-.NET) but switched to Java because the ASP stuff scaled
horribly...the site has Sun/Java branding...it's probably safe to
assume Java's involved. " Charles Nutter

12. Sohu.com China Linux unknown servers
?

13. YouTube.com Linux unknown servers
?

14. Yahoo China Like 1

15. Microsoft Windows 2003 / 2000, some FreeBSD at Hotmail, and UNIX
based firewalls.
ASP.net, ASP

16. Wikipedia Apache, very little FreeBSD
Mostly PHP, some minor PERL, Python and some Java for the
English search.

"a grand total of one FreeBSD server... The servers are primarily
running on Fedora Core 3-5...The MediaWiki software is all PHP. MySQL
....it's classic LAMP platform." Chad Perrin

"but Wikimedia do use Lucene [Apache Java based text search engine]
for at least the english search" A. S. Bradbury

17. Amazon.com FreeBSD, Linux unknown servers, Solaris 8, Netware
PERL, Proprietary, more?

18. Orkut.com Linux unknown server
"ASP.NET" Tim Bray


19. Blogger FreeBSD, Linux unknown servers
?

20. Google UK Like Google


Bye again,

Jose Hurtado
Web Developer
Toronto, Canada
 
J

Joseph

I missed Amazon on the list I just sent, it should read J2EE there too
as was mentioned:

"Amazon, for example, uses a lot of J2EE. " Tim Smith.

Regards,

Jose Hurtado
 
P

Paul Lynch

I had the very faint hope that a free version of Webobjects was
used,
these are still in Objective-C, sorry if I wasted your time, I
tried to pay
you back with the URIs for the two Objective-C versions, but just
cannot get
my hands on my last Linux-Mag France.
One obviously is GNU-Step, but the other...., sorry :(

iTMS/AppleStore/.Mac was written in WebObjects, right - and
originally in the Objective C version. WebObjects converted to Java
(not J2EE) years ago, although some users continue to run the ObjC
version of WebObjects, because of philosophical objections to Java
(nothing new there).

I believe, but can't confirm, that Apple's internal WebObjects apps
were converted to the Java version of WebObjects.

There is no 'free' version of ObjC WebObjects; GNUstep is not
WebObjects. There are some OSS implementations based on ObjC, with
and without GNUstep, but nothing that is feature complete and
compatible.

The current version of WebObjects (Java) is 'free', as it comes with
XCode, but is not Open Source.

Paul
 
J

Jeff Wood

That's the top 20 web sites that users directly visit. It doesn't
really give you visibility into what is running on the back end for the
sites that are included. Amazon, for example, uses a lot of J2EE.

Worse, it misses sites like the iTunes Music Store completely, which is
a huge enterprise application. iTMS is a Webobjects application, so is
running on J2EE.

On what grounds do you state that Amazon "uses a lot of J2EE" ???

Can you give us a link or something ?

Seeing as they seem to always be looking for web devs with Perl & Mason
skills, I'm just wondering how that fits into your "a lot of J2EE" ...

Looking @ the job ads from Amazon, I see lots of C++, Java & Perl jobs ...
I think they're using a bit of all 3 ( but if you look at the web
developer stuff, seems to be more heavily perl/mason than either of the
others. )

Anyways ... was just wanting to know where you get your information from.

j.
 
P

Paul Lynch

talking about NGObjWeb and there is GNUstepWeb (sorry for having been
inprecise), but all references seem quite old (<= 2004) so there
probably is
no Objective-C implementation of WebObjects anymore. at least none
useful.
SKYRIX (who was developpong NGObjWeb) seems to be more concentrated on
propriatary SW now.
What a pitty, sorry for the OT noise. (Well the whole thread is
anyway ;)

The closest thing I am aware of is SOPE (http://
sope.opengroupware.org/index.html), which is a superset of NGObjWeb.
This is what I was referring to as "not very compatible" with
WebObjects - mostly because the database layer is rather different,
but other reasons apply as well; SOPE is still active. It is just
that Skyrix split SOPE into the opengroupware site. To put it into
perspective, you might think that SOPE is to WebObjects as Rails
without ActiveRecord is to full Rails. Kinda.

The last news post at www.gnustepweb.org was dated 2003 :(.

Paul
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,731
Messages
2,569,432
Members
44,832
Latest member
GlennSmall

Latest Threads

Top