1 Million users.. I can't Scale!!

Y

yoda

Hi guys,
My situation is as follows:

1)I've developed a service that generates content for a mobile service.
2)The content is sent through an SMS gateway (currently we only send
text messages).
3)I've got a million users (and climbing).
4)The users need to get the data a minimum of 5 seconds after it's
generated. (not considering any bottlenecks external to my code).
5)Generating the content takes 1 second.

I'm considering moving to stackless python so that I can make use of
continuations so that I can open a massive number of connections to the
gateway and pump the messages out to each user simultaneously.(I'm
thinking of 1 connection per user).

My questions therefore are:
1)Should I switch to stackless python or should I carry out experiments
with mutlithreading the application?
2)What architectural suggestions can you give me?
3)Has anyone encountered such a situation before? How did you deal with
it?
4)Lastly, and probably most controversial: Is python the right language
for this? I really don't want to switch to Lisp, Icon or Erlang as yet.

I really need help because my application currently can't scale. Some
user's end up getting their data 30 seconds after generation(best case)
and up to 5 minutes after content generation. This is simply
unacceptable. The subscribers deserve much better service if my
startup is to survive in the market.
 
N

ncf

If you have that many users, I don't know if Python really is suited
well for such a large scale application. Perhaps it'd be better suited
to do CPU intensive tasks it in a compiled language so you can max out
proformance and then possibly use a UNIX-style socket to send/execute
instructions to the Python interface, if necessary.


Sorry I really couldn't be of much help
-Wes
 
C

Chris Curvey

I guess I'd look at each part of the system independently to be sure
I'm finding the real bottleneck. (It may be Python, it may not).

Under your current system, is your python program still trying to send
messages after 5 seconds? 30 seconds, 300 seconds? (Or have the
messages been delivered to SMS and they're waiting in queue there?)

If your python program is still streaming out the messages, what is it
spending time on? At a gross level, is your machine CPU-bound? If
you time out each step in your program after the content is generated,
where is all the time going (message assembly, sending over the
network, waiting for a response)?

Just by some back-of-the-envelope calculations, 1 million messages at
100 bytes each is 100Mb. That's a bunch of data to push over a network
in 2-3 seconds, especially in small chunks. (It's possible, but I'd
look at that.) Can the SMS gateway handle that kind of traffic
(incoming and outgoing)?

Multi-threading may help if your python program is spending all it's
time waiting for the network (quite possible). If you're CPU-bound and
not waiting on network, then multi-threading probably isn't the answer.
 
I

Irmen de Jong

Chris said:
Multi-threading may help if your python program is spending all it's
time waiting for the network (quite possible). If you're CPU-bound and
not waiting on network, then multi-threading probably isn't the answer.

Unless you are on a multi cpu/ multi core machine.
(but mind Python's GIL)

--Irmen
 
P

Paul Boddie

yoda said:
2)The content is sent through an SMS gateway (currently we only send
text messages).
[...]

4)The users need to get the data a minimum of 5 seconds after it's
generated. (not considering any bottlenecks external to my code).

You surely mean a "maximum of 5 seconds"! Unfortunately, I only have a
passing familiarity with SMS-related messaging, but I imagine you'd
have to switch on any and all quality-of-service features to get that
kind of guarantee (if it's even possible).

Paul
 
A

Alan Kennedy

[yoda]
I really need help because my application currently can't scale. Some
user's end up getting their data 30 seconds after generation(best case)
and up to 5 minutes after content generation. This is simply
unacceptable. The subscribers deserve much better service if my
startup is to survive in the market.
My questions therefore are:
1)Should I switch to stackless python or should I carry out experiments
with mutlithreading the application?
2)What architectural suggestions can you give me?
3)Has anyone encountered such a situation before? How did you deal with
it?
4)Lastly, and probably most controversial: Is python the right language
for this? I really don't want to switch to Lisp, Icon or Erlang as yet.

I highly recommend reading the following paper on the architecture of
highly concurrent systems.

A Design Framework for Highly Concurrent Systems, Welsh et al.
http://www.eecs.harvard.edu/~mdw/papers/events.pdf

The key principle that I see being applicable to your scenario is to
have a fixed number of delivery processes/threads. Welsh terms this the
"width" of your delivery channel. The number should match the number of
"delivery channels" that your infrastructure can support. If you are
delivering your SMSs by SMPP, then there is probably a limit to the
number of messages/second that your outgoing SMPP server can handle. If
you go above that limit, then you might cause thrashing or overload in
that server. If you're delivering by an actual GSM mobile connected
serially connected to your server/pc, then you should have a single
delivery process/thread for each connected mobile. These delivery
processes/threads would be fed by queues of outgoing SMSs.

If you want to use a multithreaded design, then simply use a python
Queue.Queue for each delivery channel. If you want to use a
multi-process design, devise a simple protocol for communicating those
messages from your generating database/process to your delivery channel
over TCP sockets.

As explained in Welsh's paper, you will get the highest stability
ensuring that your delivery channels only receive as many messages as
the outgoing transmission mechanism can actually handle.

If you devise a multi-process solution, using TCP sockets to distribute
messages from your generating application to your delivery channels,
then it would be very straightforward to scale that up to multiple
processes running on a either a multiple-core-cpu, a
multiple-cpu-server, or a multiple-server-network.

All of this should be achievable with python.

Some questions:

1. How are you transmitting your SMSs?
2. If you disable the actual transmission, how many SMSs can your
application generate per second?

HTH,
 
D

Damjan

If you want to use a multithreaded design, then simply use a python
Queue.Queue for each delivery channel. If you want to use a
multi-process design, devise a simple protocol for communicating those
messages from your generating database/process to your delivery channel
over TCP sockets.

Is there some python module that provides a multi process Queue?
 
S

skip

Damjan> Is there some python module that provides a multi process Queue?

Not as cleanly encapsulated as Queue, but writing a class that does that
shouldn't be all that difficult using a socket and the pickle module.

Skip
 
T

Tim Daneliuk

yoda said:
Hi guys,
My situation is as follows:

1)I've developed a service that generates content for a mobile service.
2)The content is sent through an SMS gateway (currently we only send
text messages).
3)I've got a million users (and climbing).
4)The users need to get the data a minimum of 5 seconds after it's
generated. (not considering any bottlenecks external to my code).
5)Generating the content takes 1 second.

We need more information on just where the bottleneck might be.
There are any number of places that things could be getting
choked up and you need to get a profile of where things are
falling down before trying to fix it.

However, that said:

A possible culprit is session setup/teardown - I'm assuming connection
to the SMS gateway is connection-oriented/reliable, not datagram-based.
I suggest this because this is quite often the culprit in connection-
oriented performance problems.

If this is the case, you need to preestablish sessions and pool them for
reuse somehow so that each and every message transmission does not incur
the overhead of message setup and teardown to the gateway. It is a good
idea to make that session pooling logic adaptive. Have it start with a
minimum number of preestablished sessions to the gateway and then
monitor the message 'highwater' mark. As the system becomes starved for
sessions, alocate more to the pool. As system utililization declines,
remove spare sessions from the pool until the count falls back to the
initial minimum

Write the pooling manager to be able to configure both the initial
session count as well as the interval for adjusting that count up and
down (i.e. Over what interval you will 'integrate' the function that
figures out just how many sessions the pool needs). Too short an interval
and the system will throw itself into feedback hystersis trying to
figure out just many sessions you need. Too long an interval, and the
system will exhbit poor response to changing load.

P.S. My firm does consultancy for these kinds of problems. We're always
looking for a great new customer.

Always-Developing-New-Business-ly Yours,
 
J

Jeremy Jones

Damjan> Is there some python module that provides a multi process Queue?

Not as cleanly encapsulated as Queue, but writing a class that does that
shouldn't be all that difficult using a socket and the pickle module.

Skip
What about bsddb? The example code below creates a multiprocess queue.
Kick off two instances of it, one in each of two terminal windows. Do a
mp_db.consume_wait() in one first, then do a mp_db.append("foo or some
other text here") in the other and you'll see the consumer get the
data. This keeps the stuff on disk, which is not what the OP wants,
but I *think* with flipping the flags or the dbenv, you can just keep
stuff in memory:

#!/usr/bin/env python

import bsddb
import os

db_base_dir = "/home/jmjones/svn/home/source/misc/python/standard_lib/bsddb"

dbenv = bsddb.db.DBEnv(0)
dbenv.set_shm_key(40)
dbenv.open(os.path.join(db_base_dir, "db_env_dir"),
# bsddb.db.DB_JOINENV |
bsddb.db.DB_INIT_LOCK |
bsddb.db.DB_INIT_LOG |
bsddb.db.DB_INIT_MPOOL |
bsddb.db.DB_INIT_TXN |
# bsddb.db.DB_RECOVER |
bsddb.db.DB_CREATE |
# bsddb.db.DB_SYSTEM_MEM |
bsddb.db.DB_THREAD,
)

db_flags = bsddb.db.DB_CREATE | bsddb.db.DB_THREAD


mp_db = bsddb.db.DB(dbenv)
mp_db.set_re_len(1024)
mp_db.set_re_pad(0)
mp_db_id = mp_db.open(os.path.join(db_base_dir, "mp_db.db"),
dbtype=bsddb.db.DB_QUEUE, flags=db_flags)



- JMJ
 
S

skip

Damjan> Is there some python module that provides a multi process Queue?

Skip> Not as cleanly encapsulated as Queue, but writing a class that
Skip> does that shouldn't be all that difficult using a socket and the
Skip> pickle module.

Jeremy> What about bsddb? The example code below creates a multiprocess
Jeremy> queue.

I tend to think "multiple computers" when someone says "multi-process". I
realize that's not always the case, but I think you need to consider that
case (it's the only practical way for a multi-process application to scale
beyond a few processors).

Skip
 
J

Jeff Schwab

Damjan> Is there some python module that provides a multi process Queue?

Skip> Not as cleanly encapsulated as Queue, but writing a class that
Skip> does that shouldn't be all that difficult using a socket and the
Skip> pickle module.

Jeremy> What about bsddb? The example code below creates a multiprocess
Jeremy> queue.

I tend to think "multiple computers" when someone says "multi-process". I
realize that's not always the case, but I think you need to consider that
case (it's the only practical way for a multi-process application to scale
beyond a few processors).

How many are more than "a few?"

I think processors with multiple cores per die are going to be far more
mainstream within the next few years, so I still don't think of multiple
computers for most of my multi-processing.
 
S

skip

Damjan> Is there some python module that provides a multi process Queue?

Skip> Not as cleanly encapsulated as Queue, but writing a class that
Skip> does that shouldn't be all that difficult using a socket and the
Skip> pickle module.

Here's a trivial implementation of a pair of blocking queue classes:

http://orca.mojam.com/~skip/python/SocketQueue.py

Skip
 
J

Jeremy Jones

Damjan> Is there some python module that provides a multi process Queue?

Skip> Not as cleanly encapsulated as Queue, but writing a class that
Skip> does that shouldn't be all that difficult using a socket and the
Skip> pickle module.

Jeremy> What about bsddb? The example code below creates a multiprocess
Jeremy> queue.

I tend to think "multiple computers" when someone says "multi-process". I
realize that's not always the case, but I think you need to consider that
case (it's the only practical way for a multi-process application to scale
beyond a few processors).

Skip
Doh! I'll buy that. When I hear "multi-process", I tend to think of
folks overcoming the scaling issues that accompany the GIL. This, of
course, won't scale across computers without a networking interface.

- JMJ
 
S

skip

Jeff> How many are more than "a few?"

I don't know. What can you do today in commercial stuff, 16 processors?
How many cores per die, two? Four? We're still talking < 100 processors
with access to the same chunk of memory. For the OP's problem that's still
10,000 users per processor. Maybe that's small enough, but if not, he'll
need multiple processes across machines that don't share memory.

Skip
 
S

Steven D'Aprano

If you have that many users, I don't know if Python really is suited
well for such a large scale application. Perhaps it'd be better suited
to do CPU intensive tasks it in a compiled language so you can max out
proformance and then possibly use a UNIX-style socket to send/execute
instructions to the Python interface, if necessary.

Given that the original post contains no data indicating that the issue is
Python's execution speed, why assume that's where the problem lies, and
that the problem will be solved by throwing extra layers of software at it?

There is a difference between one million users each who make one request
once a month, and one million users who are each hammering the system with
ten requests a second. Number of users on its own is a meaningless
indicator of requirements.
 
M

Michael Schneider

I would need to get a better picture of your app.

I use a package called twisted to handle large scale computing
on multicore, and multi-computer problems


http://twistedmatrix.com/

Hope this is useful,
Mike
 
J

Jeff Schwab

Jeff> How many are more than "a few?"

I don't know. What can you do today in commercial stuff, 16 processors?
How many cores per die, two? Four? We're still talking < 100 processors
with access to the same chunk of memory. For the OP's problem that's still
10,000 users per processor. Maybe that's small enough, but if not, he'll
need multiple processes across machines that don't share memory.

Sure, multiple machines are probably the right approach for the OP; I
didn't mean to disagree with that. I just don't think they are "the
only practical way for a multi-process application to scale beyond a few
processors," like you said. For many (most?) applications in need of
serious scalability, multi-processor servers are preferable. IBM has
eServers available with up to 64 processors each, and Sun sells E25Ks
with 72 processors apiece. I like to work on those sorts of machine
when possible. Of course, they're not right for every application,
especially since they're so expensive.
 
S

simonwittber

yoda said:
I'm considering moving to stackless python so that I can make use of
continuations so that I can open a massive number of connections to the
gateway and pump the messages out to each user simultaneously.(I'm
thinking of 1 connection per user).

This won't help if your gateway works synchronously. You need to
determine what your gateway can do. If it works asynchronously,
determine the max bandwidth it can handle, then determine how many
messages you can fit into 4 seconds of that bandwidth. That should
provide you with a number of connections you can safely open and still
recieve acceptable response times.
My questions therefore are:
1)Should I switch to stackless python or should I carry out experiments
with mutlithreading the application?

You will build a more scalable solution if you create a multi process
system. This will let you deploy across multiple servers, rather than
CPU's. Multithreading and Multiprocessing will only help you if your
application is IO bound.

If your application is CPU bound, multiprocessing and multithreading
will likely hurt your performance. You will have to build a parallel
processing application which will work across different machines. This
is easier than it sounds, as Python has a great selection of IPC
mechanisms to choose from.
2)What architectural suggestions can you give me?

Multithreading will introduce extra complexity and overhead. I've
always ended up regretting any use of multithreading which I have
tried. Avoid it if you can.
3)Has anyone encountered such a situation before? How did you deal with
it?

Profile each section or stage of the operation. Find the bottlenecks,
and reduce it whichever way you can. Check out ping times. Use gigabit
or better. Remove as many switches and other hops between machines
which talk to each other.

Cache content, reuse it if you can. Pregenerate content, and stick it
in a cache. Cache cache cache! :)
4)Lastly, and probably most controversial: Is python the right language
for this? I really don't want to switch to Lisp, Icon or Erlang as yet.

Absolutely. Python will let you easily implement higher level
algorithms to cope with larger problems.

Sw.
 
Y

yoda

1. How are you transmitting your SMSs?
Currently, a number of different gateways are being used: 2 provide a
SOAP web service interface, 1 other provides a REST based web service.

A transaction using the SOAP web services takes 3-5 seconds to complete
(from the point of calling the method to receive an error\success
confirmation)
The REST web service transaction takes 1 second or less to complete.
2. If you disable the actual transmission, how many SMSs can your
application generate per second?
Currently, the content is generated and a number of SMS per user are
generated. I'll have to measure this more accurately but a cursory
glance indicated that we're generting approximately 1000 sms per
second. (I'm sure this can't be right.. the parser\generator should be
faster than that:)

Additionally, I've just confirmed that the gateway's we use can pump
out 20-100 sms's per second. This is currently too slow and we'll
probably get direct access to the mobile operator's SMSC which provides
larger throughput
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,581
Members
45,057
Latest member
KetoBeezACVGummies

Latest Threads

Top