1 Million users.. I can't Scale!!

yoda · Sep 28, 2005

Hi guys,
My situation is as follows:

1)I've developed a service that generates content for a mobile service.
2)The content is sent through an SMS gateway (currently we only send
text messages).
3)I've got a million users (and climbing).
4)The users need to get the data a minimum of 5 seconds after it's
generated. (not considering any bottlenecks external to my code).
5)Generating the content takes 1 second.

I'm considering moving to stackless python so that I can make use of
continuations so that I can open a massive number of connections to the
gateway and pump the messages out to each user simultaneously.(I'm
thinking of 1 connection per user).

My questions therefore are:
1)Should I switch to stackless python or should I carry out experiments
with mutlithreading the application?
2)What architectural suggestions can you give me?
3)Has anyone encountered such a situation before? How did you deal with
it?
4)Lastly, and probably most controversial: Is python the right language
for this? I really don't want to switch to Lisp, Icon or Erlang as yet.

I really need help because my application currently can't scale. Some
user's end up getting their data 30 seconds after generation(best case)
and up to 5 minutes after content generation. This is simply
unacceptable. The subscribers deserve much better service if my
startup is to survive in the market.

ncf · Sep 28, 2005

If you have that many users, I don't know if Python really is suited
well for such a large scale application. Perhaps it'd be better suited
to do CPU intensive tasks it in a compiled language so you can max out
proformance and then possibly use a UNIX-style socket to send/execute
instructions to the Python interface, if necessary.

Sorry I really couldn't be of much help
-Wes

Chris Curvey · Sep 28, 2005

I guess I'd look at each part of the system independently to be sure
I'm finding the real bottleneck. (It may be Python, it may not).

Under your current system, is your python program still trying to send
messages after 5 seconds? 30 seconds, 300 seconds? (Or have the
messages been delivered to SMS and they're waiting in queue there?)

If your python program is still streaming out the messages, what is it
spending time on? At a gross level, is your machine CPU-bound? If
you time out each step in your program after the content is generated,
where is all the time going (message assembly, sending over the
network, waiting for a response)?

Just by some back-of-the-envelope calculations, 1 million messages at
100 bytes each is 100Mb. That's a bunch of data to push over a network
in 2-3 seconds, especially in small chunks. (It's possible, but I'd
look at that.) Can the SMS gateway handle that kind of traffic
(incoming and outgoing)?

Multi-threading may help if your python program is spending all it's
time waiting for the network (quite possible). If you're CPU-bound and
not waiting on network, then multi-threading probably isn't the answer.

Irmen de Jong · Sep 28, 2005

Chris said:
Multi-threading may help if your python program is spending all it's
time waiting for the network (quite possible). If you're CPU-bound and
not waiting on network, then multi-threading probably isn't the answer.

Unless you are on a multi cpu/ multi core machine.
(but mind Python's GIL)

--Irmen

Paul Boddie · Sep 28, 2005

yoda said:
2)The content is sent through an SMS gateway (currently we only send
text messages).
[...]

4)The users need to get the data a minimum of 5 seconds after it's
generated. (not considering any bottlenecks external to my code).

You surely mean a "maximum of 5 seconds"! Unfortunately, I only have a
passing familiarity with SMS-related messaging, but I imagine you'd
have to switch on any and all quality-of-service features to get that
kind of guarantee (if it's even possible).

Paul

Alan Kennedy · Sep 28, 2005

[yoda]

I really need help because my application currently can't scale. Some
user's end up getting their data 30 seconds after generation(best case)
and up to 5 minutes after content generation. This is simply
unacceptable. The subscribers deserve much better service if my
startup is to survive in the market.

My questions therefore are:
1)Should I switch to stackless python or should I carry out experiments
with mutlithreading the application?
2)What architectural suggestions can you give me?
3)Has anyone encountered such a situation before? How did you deal with
it?
4)Lastly, and probably most controversial: Is python the right language
for this? I really don't want to switch to Lisp, Icon or Erlang as yet.

I highly recommend reading the following paper on the architecture of
highly concurrent systems.

A Design Framework for Highly Concurrent Systems, Welsh et al.
http://www.eecs.harvard.edu/~mdw/papers/events.pdf

The key principle that I see being applicable to your scenario is to
have a fixed number of delivery processes/threads. Welsh terms this the
"width" of your delivery channel. The number should match the number of
"delivery channels" that your infrastructure can support. If you are
delivering your SMSs by SMPP, then there is probably a limit to the
number of messages/second that your outgoing SMPP server can handle. If
you go above that limit, then you might cause thrashing or overload in
that server. If you're delivering by an actual GSM mobile connected
serially connected to your server/pc, then you should have a single
delivery process/thread for each connected mobile. These delivery
processes/threads would be fed by queues of outgoing SMSs.

If you want to use a multithreaded design, then simply use a python
Queue.Queue for each delivery channel. If you want to use a
multi-process design, devise a simple protocol for communicating those
messages from your generating database/process to your delivery channel
over TCP sockets.

As explained in Welsh's paper, you will get the highest stability
ensuring that your delivery channels only receive as many messages as
the outgoing transmission mechanism can actually handle.

If you devise a multi-process solution, using TCP sockets to distribute
messages from your generating application to your delivery channels,
then it would be very straightforward to scale that up to multiple
processes running on a either a multiple-core-cpu, a
multiple-cpu-server, or a multiple-server-network.

All of this should be achievable with python.

Some questions:

1. How are you transmitting your SMSs?
2. If you disable the actual transmission, how many SMSs can your
application generate per second?

HTH,

Damjan · Sep 28, 2005

If you want to use a multithreaded design, then simply use a python

Queue.Queue for each delivery channel. If you want to use a
multi-process design, devise a simple protocol for communicating those
messages from your generating database/process to your delivery channel
over TCP sockets.

Is there some python module that provides a multi process Queue?

skip · Sep 28, 2005

Damjan> Is there some python module that provides a multi process Queue?

Not as cleanly encapsulated as Queue, but writing a class that does that
shouldn't be all that difficult using a socket and the pickle module.

Skip

Tim Daneliuk · Sep 28, 2005

yoda said:
Hi guys,
My situation is as follows:

1)I've developed a service that generates content for a mobile service.
2)The content is sent through an SMS gateway (currently we only send
text messages).
3)I've got a million users (and climbing).
4)The users need to get the data a minimum of 5 seconds after it's
generated. (not considering any bottlenecks external to my code).
5)Generating the content takes 1 second.

We need more information on just where the bottleneck might be.
There are any number of places that things could be getting
choked up and you need to get a profile of where things are
falling down before trying to fix it.

However, that said:

A possible culprit is session setup/teardown - I'm assuming connection
to the SMS gateway is connection-oriented/reliable, not datagram-based.
I suggest this because this is quite often the culprit in connection-
oriented performance problems.

If this is the case, you need to preestablish sessions and pool them for
reuse somehow so that each and every message transmission does not incur
the overhead of message setup and teardown to the gateway. It is a good
idea to make that session pooling logic adaptive. Have it start with a
minimum number of preestablished sessions to the gateway and then
monitor the message 'highwater' mark. As the system becomes starved for
sessions, alocate more to the pool. As system utililization declines,
remove spare sessions from the pool until the count falls back to the
initial minimum

Write the pooling manager to be able to configure both the initial
session count as well as the interval for adjusting that count up and
down (i.e. Over what interval you will 'integrate' the function that
figures out just how many sessions the pool needs). Too short an interval
and the system will throw itself into feedback hystersis trying to
figure out just many sessions you need. Too long an interval, and the
system will exhbit poor response to changing load.

P.S. My firm does consultancy for these kinds of problems. We're always
looking for a great new customer.

Always-Developing-New-Business-ly Yours,

Jeremy Jones · Sep 28, 2005

Damjan> Is there some python module that provides a multi process Queue?

Not as cleanly encapsulated as Queue, but writing a class that does that
shouldn't be all that difficult using a socket and the pickle module.

Skip

What about bsddb? The example code below creates a multiprocess queue.
Kick off two instances of it, one in each of two terminal windows. Do a
mp_db.consume_wait() in one first, then do a mp_db.append("foo or some
other text here") in the other and you'll see the consumer get the
data. This keeps the stuff on disk, which is not what the OP wants,
but I *think* with flipping the flags or the dbenv, you can just keep
stuff in memory:

#!/usr/bin/env python

import bsddb
import os

db_base_dir = "/home/jmjones/svn/home/source/misc/python/standard_lib/bsddb"

dbenv = bsddb.db.DBEnv(0)
dbenv.set_shm_key(40)
dbenv.open(os.path.join(db_base_dir, "db_env_dir"),
# bsddb.db.DB_JOINENV |
bsddb.db.DB_INIT_LOCK |
bsddb.db.DB_INIT_LOG |
bsddb.db.DB_INIT_MPOOL |
bsddb.db.DB_INIT_TXN |
# bsddb.db.DB_RECOVER |
bsddb.db.DB_CREATE |
# bsddb.db.DB_SYSTEM_MEM |
bsddb.db.DB_THREAD,
)

db_flags = bsddb.db.DB_CREATE | bsddb.db.DB_THREAD

mp_db = bsddb.db.DB(dbenv)
mp_db.set_re_len(1024)
mp_db.set_re_pad(0)
mp_db_id = mp_db.open(os.path.join(db_base_dir, "mp_db.db"),
dbtype=bsddb.db.DB_QUEUE, flags=db_flags)

- JMJ

skip · Sep 28, 2005

Damjan> Is there some python module that provides a multi process Queue?

Skip> Not as cleanly encapsulated as Queue, but writing a class that
Skip> does that shouldn't be all that difficult using a socket and the
Skip> pickle module.

Jeremy> What about bsddb? The example code below creates a multiprocess
Jeremy> queue.

I tend to think "multiple computers" when someone says "multi-process". I
realize that's not always the case, but I think you need to consider that
case (it's the only practical way for a multi-process application to scale
beyond a few processors).

Skip

Jeff Schwab · Sep 28, 2005

Damjan> Is there some python module that provides a multi process Queue?

Skip> Not as cleanly encapsulated as Queue, but writing a class that
Skip> does that shouldn't be all that difficult using a socket and the
Skip> pickle module.

Jeremy> What about bsddb? The example code below creates a multiprocess
Jeremy> queue.

I tend to think "multiple computers" when someone says "multi-process". I
realize that's not always the case, but I think you need to consider that
case (it's the only practical way for a multi-process application to scale
beyond a few processors).

How many are more than "a few?"

I think processors with multiple cores per die are going to be far more
mainstream within the next few years, so I still don't think of multiple
computers for most of my multi-processing.

skip · Sep 28, 2005

Damjan> Is there some python module that provides a multi process Queue?

Skip> Not as cleanly encapsulated as Queue, but writing a class that
Skip> does that shouldn't be all that difficult using a socket and the
Skip> pickle module.

Here's a trivial implementation of a pair of blocking queue classes:

http://orca.mojam.com/~skip/python/SocketQueue.py

Skip

Jeremy Jones · Sep 28, 2005

Damjan> Is there some python module that provides a multi process Queue?

Skip> Not as cleanly encapsulated as Queue, but writing a class that
Skip> does that shouldn't be all that difficult using a socket and the
Skip> pickle module.

Jeremy> What about bsddb? The example code below creates a multiprocess
Jeremy> queue.

I tend to think "multiple computers" when someone says "multi-process". I
realize that's not always the case, but I think you need to consider that
case (it's the only practical way for a multi-process application to scale
beyond a few processors).

Skip

Doh! I'll buy that. When I hear "multi-process", I tend to think of
folks overcoming the scaling issues that accompany the GIL. This, of
course, won't scale across computers without a networking interface.

- JMJ

skip · Sep 28, 2005

Jeff> How many are more than "a few?"

I don't know. What can you do today in commercial stuff, 16 processors?
How many cores per die, two? Four? We're still talking < 100 processors
with access to the same chunk of memory. For the OP's problem that's still
10,000 users per processor. Maybe that's small enough, but if not, he'll
need multiple processes across machines that don't share memory.

Skip

Steven D'Aprano · Sep 29, 2005

If you have that many users, I don't know if Python really is suited
well for such a large scale application. Perhaps it'd be better suited
to do CPU intensive tasks it in a compiled language so you can max out
proformance and then possibly use a UNIX-style socket to send/execute
instructions to the Python interface, if necessary.

Given that the original post contains no data indicating that the issue is
Python's execution speed, why assume that's where the problem lies, and
that the problem will be solved by throwing extra layers of software at it?

There is a difference between one million users each who make one request
once a month, and one million users who are each hammering the system with
ten requests a second. Number of users on its own is a meaningless
indicator of requirements.

Michael Schneider · Sep 29, 2005

I would need to get a better picture of your app.

I use a package called twisted to handle large scale computing
on multicore, and multi-computer problems

http://twistedmatrix.com/

Hope this is useful,
Mike

Jeff Schwab · Sep 29, 2005

Jeff> How many are more than "a few?"

I don't know. What can you do today in commercial stuff, 16 processors?
How many cores per die, two? Four? We're still talking < 100 processors
with access to the same chunk of memory. For the OP's problem that's still
10,000 users per processor. Maybe that's small enough, but if not, he'll
need multiple processes across machines that don't share memory.

Sure, multiple machines are probably the right approach for the OP; I
didn't mean to disagree with that. I just don't think they are "the
only practical way for a multi-process application to scale beyond a few
processors," like you said. For many (most?) applications in need of
serious scalability, multi-processor servers are preferable. IBM has
eServers available with up to 64 processors each, and Sun sells E25Ks
with 72 processors apiece. I like to work on those sorts of machine
when possible. Of course, they're not right for every application,
especially since they're so expensive.

simonwittber · Sep 29, 2005

yoda said:
I'm considering moving to stackless python so that I can make use of
continuations so that I can open a massive number of connections to the
gateway and pump the messages out to each user simultaneously.(I'm
thinking of 1 connection per user).

This won't help if your gateway works synchronously. You need to
determine what your gateway can do. If it works asynchronously,
determine the max bandwidth it can handle, then determine how many
messages you can fit into 4 seconds of that bandwidth. That should
provide you with a number of connections you can safely open and still
recieve acceptable response times.

My questions therefore are:
1)Should I switch to stackless python or should I carry out experiments
with mutlithreading the application?

You will build a more scalable solution if you create a multi process
system. This will let you deploy across multiple servers, rather than
CPU's. Multithreading and Multiprocessing will only help you if your
application is IO bound.

If your application is CPU bound, multiprocessing and multithreading
will likely hurt your performance. You will have to build a parallel
processing application which will work across different machines. This
is easier than it sounds, as Python has a great selection of IPC
mechanisms to choose from.

2)What architectural suggestions can you give me?

Multithreading will introduce extra complexity and overhead. I've
always ended up regretting any use of multithreading which I have
tried. Avoid it if you can.

3)Has anyone encountered such a situation before? How did you deal with
it?

Profile each section or stage of the operation. Find the bottlenecks,
and reduce it whichever way you can. Check out ping times. Use gigabit
or better. Remove as many switches and other hops between machines
which talk to each other.

Cache content, reuse it if you can. Pregenerate content, and stick it
in a cache. Cache cache cache!

4)Lastly, and probably most controversial: Is python the right language
for this? I really don't want to switch to Lisp, Icon or Erlang as yet.

Absolutely. Python will let you easily implement higher level
algorithms to cope with larger problems.

Sw.

yoda · Sep 29, 2005

1. How are you transmitting your SMSs?
Currently, a number of different gateways are being used: 2 provide a
SOAP web service interface, 1 other provides a REST based web service.

A transaction using the SOAP web services takes 3-5 seconds to complete
(from the point of calling the method to receive an error\success
confirmation)
The REST web service transaction takes 1 second or less to complete.

2. If you disable the actual transmission, how many SMSs can your
application generate per second?

Currently, the content is generated and a number of SMS per user are
generated. I'll have to measure this more accurately but a cursory
glance indicated that we're generting approximately 1000 sms per
second. (I'm sure this can't be right.. the parser\generator should be
faster than that

Additionally, I've just confirmed that the gateway's we use can pump
out 20-100 sms's per second. This is currently too slow and we'll
probably get direct access to the mobile operator's SMSC which provides
larger throughput

Oddity with large dictionary (several million entries)	4	Apr 27, 2010
I need help making a zooming function	11	Dec 14, 2021
I Need Help with making a function that draws in a canvas using location data.	1	Dec 17, 2021
relative speed of incremention syntaxes (or "i=i+1" VS "i+=1")	33	Aug 21, 2011
Accessing matplotlib-users discussion group?	3	Sep 14, 2011
What should I do Before I give up programming?	6	Jan 14, 2023
I would like to use awk to calculate the total number of records processed	1	Aug 25, 2022
GET NEIL DEGRASSES TYSON, I ripped a hole with this one...	0	Nov 10, 2022

1 Million users.. I can't Scale!!

yoda

ncf

Chris Curvey

Irmen de Jong

Paul Boddie

Alan Kennedy

Damjan

skip

Tim Daneliuk

Jeremy Jones

skip

Jeff Schwab

skip

Jeremy Jones

skip

Steven D'Aprano

Michael Schneider

Jeff Schwab

simonwittber

yoda

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads