Ruby Enterprise App Design Advice

T

TeslaOMD

I was wondering if anyone could give me some advice/thoughts/input
regarding scaling/designing in Ruby for an enterprise level app. Mainly
I am looking for suggestions for existing Ruby code/apps I can use to
address my design and concerns. I would like a lot of feedback on
FastCGI vs. running things the .NET or J2EE way. In a way I'm building
a framework, but a project specific one for an enterprise app, not for
the average personal dynamic website. I may release it after the
project is complete.

Please excuse the length of this post but I think I should explain the
situation a bit since scalability and design are broad and complex
issues. I know that many do not consider Ruby to be an enterprise level
language, but I am a believer in rapid development and scaling through
good design/programming. Language does not matter to me. I'll also
mention that I know performance != scalability, though I will say it
can help scalability, particularly vertically.

Background:

I'll try to oversimplify this in the interest of this post not being
more of a book than it already is. The software is actually a new idea
that I'm not going to reveal for creative reasons, but it is probably
closest to some of the existing social networking/dating apps. The app
must support over a million users (yes, this is realistic if not a
gross underestimate if all goes well). We expect to have several
thousand concurrent users. AJAX and querying DB data will play a heavy
role in the system. Users will have lots of preferences/settings to
manage/configure, most of which will be stored to disk as xml
(serialized objects perhaps?) rather than the database.

Our reason for selecting Ruby is because we want to continue to bring
more attention to what we feel is a wonderful language (thank you Rails
and others for contributing already) and also since we want to reduce
our time to market as much as possible. We also want to try to give
back some of our end results to the Ruby community in terms of code.

Our development team is very experienced in web apps, but new to the
Ruby world as far as serious applications. We are trying to find out
more about what is already out there, needs to be done, and what
limitations we might face. We've done extensive research but I would
like some opinions directly from rubyists, rubycons, whatever. We'd
like to scale horizontally and vertically and leverage cheap hardware
to start until the site gets going more.

Current Specs/Design:

Our design uses MVC/n-tier. Business logic should be able to run
independently on its own servers and not care about the web. Some of
our basic design/plans are as follows:

1. Web Server -- Lighthttpd - seems to work well with FastCGI, very
quick. Load balanced to send user to best server. Dedicated servers for
things like serving images.

2. FastCGI or SCGI - We would like to replace FastCGI with something
else if possible since we have concerns about all of our processes
being constantly occupied by AJAX polling back to server code. We're
not entirely convinced FastCGI is a great architecture for us but if we
do use it, we would like to scale it accross many servers and use a
SessionID to bind a user to a server. I worry about running out of
available FastCGI processes, even with multiple machines.

3. Database -- MySQL or Postgres - We need transaction support and data
integrity -- doubts about MySQL in these areas. We also need good join
performance - database is heavily normalized, may denormalize if
needed. Separate DBs for things like Logging/Auditing vs. Content. Want
to cluster DBs. We want connection pooling. Considering making some
modifications to DBI. Use stored procedures in the database, no sql in
the code/dynamic sql. I loathe O/R mappers for complex databases and
they would likely bring our system to a screeching halt. Unfortunately
we cannot afford to use Oracle right now.

4. Caching -- We'll cache user settings, DB data, etc. Need a good ruby
caching solution. Considering using memcached or something else
existing.

5. Templating -- Not satisfied with anything. Considered Clearsilver
but now developing our own system in mainly C++ that is more
accomodating to the way our site works/AJAX.

6. Unit Testing and Performance numbers for every procedure. Is there a
good code profiling tool for Ruby? Use both with makefile process.
Where we identify bottlenecks, we may refactor or write in C++. We may
move some of the critical components to C++ where possible. We've
looked into using SWIG to help our C++ efforts.

7. Sessions - Separate server for managing Sessions if possible. I
would like to persist things in memory and share if possible. If not
possible, we'll settle for persisting info to the database. Been
looking at Session affinity for Ruby some.

8. AJAX - Polling and Queuing system for AJAX interactions in
Javascript. Queing server side. Likely we will use and possibly extend
an existing AJAX library such as Mochikit or base it off another
existing one.

9. Remote - Looked at Drb for some remote things.

10. External Services - We may expose some web services in the future
or have internal web services. We may also be providing RSS feeds from
things like blogs or lists. We'll use ReXML for our xml needs, but we
may switch to c++. I've heard performance concerns about REXML but have
yet to test for myself.

11. Events/Threading - I am worried about Ruby here, especially after
reading about Myriad's problems with libevent. We will definitely need
something similar to delegates and events and some good queueing and
threading functionality.

We looked at Rails like everyone else but after using it a bit, reading
the author's blog, and from previous experience in other languages it
is clearly not suitable for the size and complexity of our application.
I also will reiterate that I hate O/R mapping unless it's for a quick
personal app.

Concerns: Our main concern is of course scalability. Our AJAX controls
(many already finished) will need to poll the server in some cases over
a specified interval. We know that this is going to create some new
demands that we did not have to worry about in the purely synchronous
webpage development model. We have already seen these issues some in
our .NET, J2EE, PHP, and Python apps, but none of them have to deal
with this much traffic. We feel that many of our AJAX controls are
going to create unique demands on the application, particularly given
the environment Ruby runs in and its threading limitations.

For instance, a purely hypothetical example might be we have a control
that lists online users along with status information.The controls
would need to requery information every 20 seconds to obtain fresh
information about the users (are they online? what is their current
mood? what did they last do?). This means that our server is going to
get hit a lot harder than your typical web application that serves up a
dynamic page, then sits idle until the user moves on to a new page.

Final Thoughts:

I believe my greatest concerns are shared memory, caching, sessions,
and the FastCGI dynamic. We need to support a lot of simultaneous users
and I'm worried the process model needs to be perhaps replaced with a
threadpool/queing model (one that responds quick though). Caching on
all levels will have to take a lot of the pressure off of us. I've
never been a fan of Session variables but we do need to manage a lot of
session info. We need a lot of user to user interaction so this is
another concern with the FastCGI setup.

The read-write ratio in the app is probably about 70% read 30% write.
At any one time however we can expect the app is doing a significant
amount of writes to the DB but not compared to the amount of reads, so
keep this in mind. Our entire design needs to factor this into the
equation.

I hope that even though I'm new here I can get some good suggestions.
I'm excited to write something on this level in Ruby and get away from
the .NET, PHP, and J2EE stuff I've worked a lot with in the past. FYI,
I started doing C, Pascal, COBOL, etc. in desktop and client/server
apps for many years before moving on to mainly web apps so forgive me
if I'm naturally skeptical of everything.

Thanks and I would deeply appreciate any input no matter how big or
small.
 
W

Wilson Bilkovich

I'll insert my comments inline. Pardon any mangling I manage to do to
your message.

In a way I'm building
a framework, but a project specific one for an enterprise app, not for
the average personal dynamic website. I may release it after the
project is complete.

This is a good idea in general, because it forces you to see the
problem in an abstract way, and to make sure you have full test
coverage. Rails is much better as an open project than as an in-house
framework, from a code quality perspective.
I know that many do not consider Ruby to be an enterprise level
language, but I am a believer in rapid development and scaling through
good design/programming.

I've been using Ruby in production in an enterprise environment for
some time. Most enterprise (read, Java) experts like to ignore the
amount of code tied up in "non-enterprise" languages like shell
script. If Ruby isn't an enterprise language, is Perl? It sounds
like you're not too worried about the viewpoint of CIO magazine, so I
think you can go ahead and use Ruby if you want. Heh.
Users will have lots of preferences/settings to
manage/configure, most of which will be stored to disk as xml
(serialized objects perhaps?) rather than the database.

Do you have a scalability plan for this? Databases are mostly a
'solved problem', but persisting things to disk in a cluster is much
harder. Are you worried about the overhead of the database for this
task?
We also want to try to give
back some of our end results to the Ruby community in terms of code.
Very cool.
3. Database -- MySQL or Postgres - We need transaction support and data
integrity -- doubts about MySQL in these areas. We also need good join
performance - database is heavily normalized, may denormalize if
needed. Separate DBs for things like Logging/Auditing vs. Content. Want
to cluster DBs. We want connection pooling. Considering making some
modifications to DBI. Use stored procedures in the database, no sql in
the code/dynamic sql. I loathe O/R mappers for complex databases and
they would likely bring our system to a screeching halt. Unfortunately
we cannot afford to use Oracle right now.

If your database is heavily normalized, isn't that precisely where ORM
systems shine? It's usually with complex 'legacy' schema that you run
into trouble. Object mappers save so much developer time, I'd rather
spend the extra time writing better business logic, and optimizing the
database design. Slogging through endless "stuff these fields into
these properties" modules is a recipe for difficult refactoring.

Personally, I've really only used Oracle with Ruby. Between MySQL and
Postgres, my choice would certainly be the latter. I much prefer
sequences to autoincrements, and it's nice to have subselects not be a
recently-added feature. That being said, many huge systems are
running on MySQL. You just need to be familiar with its limitations
while designing your app.
4. Caching -- We'll cache user settings, DB data, etc. Need a good ruby
caching solution. Considering using memcached or something else
existing.
Rails comes with pretty slick caching. Even if you don't want to use
Rails itself, you could at least borrow their caching code. Remember,
also, that you can use Rails without ActiveRecord, if you don't like
ORM libraries.
5. Templating -- Not satisfied with anything. Considered Clearsilver
but now developing our own system in mainly C++ that is more
accomodating to the way our site works/AJAX.
What exactly is ERb not doing for you? I've never been a big template
fan, myself, except maybe for what Wicket uses: (Warning: Java!)
http://wicket.sourceforge.net/
6. Unit Testing and Performance numbers for every procedure. Is there a
good code profiling tool for Ruby?
Check out Ryan Davis's awesome work, in the form of ZenProfiler,
RubyInline, Ruby2c, etc.
http://rubyforge.org/projects/zenhacks/
7. Sessions - Separate server for managing Sessions if possible. I
would like to persist things in memory and share if possible. If not
possible, we'll settle for persisting info to the database. Been
looking at Session affinity for Ruby some.
You can do this with DRb, which is extremely fast and handy.=20
Alternately, you can just store sessions in the DB, which is
convenient because it yields fewer 'things' to optimize and profile.
8. AJAX - Polling and Queuing system for AJAX interactions in
Javascript. Queing server side. Likely we will use and possibly extend
an existing AJAX library such as Mochikit or base it off another
existing one.
Have you taken a look at Zimbra? They have a very cool implementation of t=
his.
http://www.zimbra.com/
10. External Services - We may expose some web services in the future
or have internal web services. We may also be providing RSS feeds from
things like blogs or lists. We'll use ReXML for our xml needs, but we
may switch to c++. I've heard performance concerns about REXML but have
yet to test for myself.
Even if ReXML didn't serve your needs, you could use RubyInline to
re-implement slow parts of it in C, without tossing out all your
Ruby-ness.
11. Events/Threading - I am worried about Ruby here, especially after
reading about Myriad's problems with libevent. We will definitely need
something similar to delegates and events and some good queueing and
threading functionality.
In my opinion, you should stay away from this kind of code in Ruby
until 2.0 is out. Luckily, it is easy to hook Ruby up to C code, and
you can write your performance-critical code in that language instead.
I'm glad that Myriad thread made it to the list. Very illuminating.
We looked at Rails like everyone else but after using it a bit, reading
the author's blog, and from previous experience in other languages it
is clearly not suitable for the size and complexity of our application.
I also will reiterate that I hate O/R mapping unless it's for a quick
personal app.
Well, the authors of Hibernate don't agree with you. Heh. Are you
sure this personal preference is worth the development time cost?
Concerns: Our main concern is of course scalability. Our AJAX controls
(many already finished) will need to poll the server in some cases over
a specified interval.
This is pretty much a non-issue in the CGI model. Just add more
webservers as needed. There are Rails sites doing 1M+ page requests a
day on only three cheap rackmount boxes. You should focus your
concern on the database. If your database can handle the load,
scaling Ruby / CGI is super easy.
For instance, a purely hypothetical example might be we have a control
that lists online users along with status information.The controls
would need to requery information every 20 seconds to obtain fresh
information about the users (are they online? what is their current
mood? what did they last do?). This means that our server is going to
get hit a lot harder than your typical web application that serves up a
dynamic page, then sits idle until the user moves on to a new page.
Be very careful with this. There was a good article in ACM Queue
about this recently:
http://acmqueue.org/modules.php?name=3DContent&pa=3Dshowpage&pid=3D337
Google's advice is to add as much delay between your client requests
as possible, and they use the auto-refresh concept as an example.
We need to support a lot of simultaneous users
and I'm worried the process model needs to be perhaps replaced with a
threadpool/queing model (one that responds quick though).
Remember, Apache HTTPd uses the process model as well, and scales
pretty much as far as you want. In fact, I'd say that the process
model has in general been shown to be an easier scaling problem than
threading. It's threads that I worry about. Processes are easy, and
can even be migrated to other systems at runtime, without stopping
them.

Let me leave you with the old adage: Don't optimize prematurely.=20
Developers are, in general, NOT good at predicting which pieces of
code will end up being hit hardest. Don't spend 6 months implementing
an event queue only to find it using 1% of the total resources. =20
Write the code in Ruby, profile it, rewrite the parts that are too
slow.

Best of luck,
--Wilson.
 
T

TeslaOMD

Thanks for the reply Wilson. Pardon my mangled reply as well.

-I believe our plan is to just use MySQL and Postgres side by side for
awhile. We're going to be forced to double write things but that should
not be too much extra work. We have some complex queries, for example
we need to return 1-7 depths of a web network in a query (think
friendster but a little more sophisticated. We've done the same thing
in other projects, but the problem becomes tuning it to the database we
select. I'm working on some scripts to start filling our tables with
millions of rows so we can test this thing under a big data load from
the start.

-We don't want to use Rails though we may take ideas/parts where
permission allows us to do so. I've looked a lot at Rails and would use
it for a smaller app, but I don't think it's well suited to what we
want to do. We're experienced developers and prefer our own mess, plus
we would like to show that there are alternative to Rails for using
Ruby via the web. Others are doing the same. We also don't like being
tied into a framework -- work with Spring, .NET, Struts, etc. has
taught us that frameworks are great, but often too generic or generate
too much work for specific tasks. Rails is no doubt a great framework,
but not for us and I doubt it would scale to meet our needs rendering
very complex templates and bringing back lots of DB data that an O/R
mapper might choke trying to do.

-We don't want to use an O/R mapper because of the complexity of a lot
of our queries. I am a fan of SQL, and I've been certified in other DBs
before so I know my way around the language. I like the idea of stored
procedures (segregrating code in one place, reusing accross systems,
assigning proper rights per procedure, caching, etc). Obviously some of
this is dependent on our choice of DB, but regardless I prefer not to
use stored procs. I've dealt a lot with O/R mappers in the past and
performance tuning in Oracle and SQL Server 2000 and I can tell you
that moving from stored procedures form an O/R mapper can make a world
of difference. You spend a lot more time and maintenance, but you gain
some more flexibility. Some people like the best of both worlds, but I
prefer to keep set based operations to SQL in the DB server. I don't
want an O/R mapper touching my precious schema :)


-Our reason for not using ERb is a difficult one to understand without
being involved in the project I admit. We need a system to allow users
to upload their own themes. We also have a complex widget system (think
portals) that may one day be nice to support on a thick client.
Standard templating would be a lot of work per widget. I would rather
design a templating system that works very fast and easy for what we
need but requires more time up front. ERb reminds me of ASP days, no
thanks. I've read David H.'s reasons for using ERb and I think they are
valid, but not valid for us. I am not a fan of code in the template.
That said, my developers and I are disciplined enough not to do
anything bad in a template. ERb would be my third choice behind
quicksilver.

-Thanks for the tip on Zimbra and the Zen profiler. Regarding the
threading code and some of my fastcgi/ajax concerns, I think I could
take a tip from Jetty and use Ruby continuations in conjunction with
the response/request. Our group is very experienced with threading so I
am not intimidated, but in the past I have dealt with the nightmares of
deadlocks in particular as well as race conditions, memory bullying,
etc. It seems to me like Ruby doesn't have a lot of the nice things in
Java and .NET to deal with some of the classic threading issues.

Thanks again for pointing me towards some projects I did not know
about. Any other thoughts you have would be great because your feedback
at the very least gets me questioning myself and new ideas flowing.
FYI, like I think I mentioned we've used Ruby, just not in the context
of a million user system.
 
R

Robert Klemme

<disclaimer>I have done web applications and I have done enterprise level
apps but not enterprize level web apps of this scale. I do have quite
some experience with databases though.</disclaimer>

TeslaOMD wrote:

I'll try to oversimplify this in the interest of this post not being
more of a book than it already is. The software is actually a new idea
that I'm not going to reveal for creative reasons, but it is probably
closest to some of the existing social networking/dating apps. The app
must support over a million users (yes, this is realistic if not a
gross underestimate if all goes well). We expect to have several
thousand concurrent users. AJAX and querying DB data will play a heavy
role in the system. Users will have lots of preferences/settings to
manage/configure, most of which will be stored to disk as xml
(serialized objects perhaps?) rather than the database.

Did you consider writing user configs using YAML? It's human readably and
you don't have to fiddle with XML mappings etc.

Current Specs/Design:

Our design uses MVC/n-tier. Business logic should be able to run
independently on its own servers and not care about the web. Some of
our basic design/plans are as follows:

1. Web Server -- Lighthttpd - seems to work well with FastCGI, very
quick. Load balanced to send user to best server. Dedicated servers
for things like serving images.

2. FastCGI or SCGI - We would like to replace FastCGI with something
else if possible since we have concerns about all of our processes
being constantly occupied by AJAX polling back to server code. We're
not entirely convinced FastCGI is a great architecture for us but if
we do use it, we would like to scale it accross many servers and use a
SessionID to bind a user to a server. I worry about running out of
available FastCGI processes, even with multiple machines.

An architectural idea: since you want to use sessions to pin users to a
certain instance and want to have a single session server (if I understand
you correctly) you could set up your session server as DRB server that
deals with login and logout and assigns an app server. App servers use
DRB to register with the session server and also are DRB servers for
FastCGI handlers. The FastCGI client programs are only small stubs that
ask the session server for the app server and then connect to the app
server and probably stream results back to the web client. There is room
for optimization and tuning: mapping from session to app server can be
cached to save one internal network communication. You are free to decide
where to do the HTML generation - on the app server or more likely in the
FastCGI program. To better utilize resources you can have several app
servers per machine (i.e. several ruby processes that use different ports
for DRB communication).
3. Database -- MySQL or Postgres - We need transaction support and
data integrity -- doubts about MySQL in these areas. We also need
good join performance - database is heavily normalized, may
denormalize if needed. Separate DBs for things like Logging/Auditing
vs. Content. Want to cluster DBs. We want connection pooling.
Considering making some modifications to DBI. Use stored procedures
in the database, no sql in the code/dynamic sql. I loathe O/R mappers
for complex databases and they would likely bring our system to a
screeching halt. Unfortunately we cannot afford to use Oracle right
now.

I don't have experience with postgres and mysql. Given the fact that
mysql is quite new and so are stored procedures and the way it deals with
transactions I'd probably tend to use postgres.
4. Caching -- We'll cache user settings, DB data, etc. Need a good
ruby caching solution. Considering using memcached or something else
existing.

Difficult to comment on this without further detailing the app
architecture. Just one remark: I would try to not cache DB content as
databases do this already and they are usually quite good at doing this.

7. Sessions - Separate server for managing Sessions if possible. I
would like to persist things in memory and share if possible. If not
possible, we'll settle for persisting info to the database. Been
looking at Session affinity for Ruby some.

see above.
8. AJAX - Polling and Queuing system for AJAX interactions in
Javascript. Queing server side. Likely we will use and possibly extend
an existing AJAX library such as Mochikit or base it off another
existing one.

9. Remote - Looked at Drb for some remote things.

see above.

11. Events/Threading - I am worried about Ruby here, especially after
reading about Myriad's problems with libevent. We will definitely need
something similar to delegates and events and some good queueing and
threading functionality.

If your FastCGI processes are lightweight then ruby threading is probably
good enough.


<snip/>

Kind regards

robert
 
K

Kevin Brown

I don't have experience with postgres and mysql. Given the fact that
mysql is quite new and so are stored procedures and the way it deals with
transactions I'd probably tend to use postgres.

You could write a whole book here, but from what you say I'd go with Postgres.
If you want to search through the MySQL support forums you'll find a lot more
complaints about corrupt dbs, etc. Postgres favors reliability over speed
and ease of use. Not that it's slow at all, just that that was their first
concern. I almost always use Postgres now just because it's so nice to work
with (in my opinion of course). :)
 
T

TeslaOMD

Kevin,

Thanks. My thoughts were also towards reliability over speed. Our only
concern is the performance of some complex queries. We'll of course
spend a lot of time tuning for Postgres and see how it goes. If
anyone's interested, maybe I'll post some Ruby/MySQL and Ruby Postgres
performance numbers once we get everything setup. It should be a few
weeks.
 
T

TeslaOMD

Robert,

I like your idea about the Session server and Drb. I think we'll have
to get things going and just see how it works. I seem to understand the
basics of the Drb examples, but I'm wondering if anyone knows how
performant it really is? I suppose I'll see for myself after a few
weeks of code. Thanks for the feedback.
 
K

Kirk Haines

On Thursday 01 December 2005 8:19 am, Kirk Haines wrote:

Guh. Appologies for forgetting to trim the quoted sections of that post. Bad
me. Bad.


Thanks,

Kirk Haines
 
J

Jules Jacobs

Would XSLT be an option for templates? It is a webstandard, and you
could do this in C/C++ or use an existing library.
 
K

Kirk Haines

I like your idea about the Session server and Drb. I think we'll have
to get things going and just see how it works. I seem to understand the
basics of the Drb examples, but I'm wondering if anyone knows how
performant it really is? I suppose I'll see for myself after a few
weeks of code. Thanks for the feedback.

I've benchmarked some simple Drb tasks before. It's not as fast as I'd like,
but about as fast as I expected.

I don't have any hard numbers because it's been a while since I did it, but it
seems like when I was playing with accessing a hash held by Drb, I was
getting about 100 calls per second, running both client and server on the
same machine.

I have a database connection pool being shared by a number of low usage web
apps and sites right now via Drb. It's fast enough that I can't notice a
speed difference on any of the apps compared to each of them having their own
private pool, but with the loads you are talking about, I think a single Drb
session server would be a bottleneck.


Kirk Haines
 
T

TeslaOMD

Jules,

Unfortunately I don't feel xslt is very human readable. Our current
concept is a template system similar to XUL but much simpler. This way
we can create a thick client and webpages without changing any of the
controller code.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,920
Messages
2,570,038
Members
46,449
Latest member
onedumbsquirrel

Latest Threads

Top