JMS scalability question

S

saxo123

Hi folks,

let's say there were a situation like this: Some producers send
thousands of messages through a JMS system to some thousands
consumers. What is the best approach towards scalability?

1) 1 JMS publish-subscriber channel with one dispatcher at the end
that takes every msg from the channel and propagates it to the
respective consumer out of those thousand consumers

2) 1 JMS publish-subscriber channel where every consumer peeks a newly
arrived msg and checks whether it is for him and takes it from the
queue if so.

3) Many JMS publish-subscriber channels with one dispatcher for
everyone. I favour this one.

4) Thousand point-to-point connections. Sure not, I guess.

One problem that makes me think: If those dispatcher threads obtain
little CPU time the messages will sit in the queue and throughput will
go down. Making those dispatcher threads demon therads is all there is
to it?

One might say that I should go downtown and by a book about JMS.
Already did that, perused several books, acutally. They don't tell you
what approach works best in certain situations. It's like looking for
a book that tells you about good software design. They are almost
inexistent, as well ...

Thanks in advance.
Cheers, Oliver
 
R

Roedy Green

let's say there were a situation like this: Some producers send
thousands of messages through a JMS system to some thousands
consumers. What is the best approach towards scalability?

Other than doing an experiment, you think this way.

Most likely the bottleneck is disk i/o. Pretty much any approach is
going to bog behind the bottleneck of a single moving disk arm. To
fix that you can:

1. throw RAM at it.
2. use an SSD for the database . See
http://mindprod.com/bgloss/ssd.html
3. see if you can find a JMS implementation that is more of a RAM hog.
4. tweak the OS and JVM to do what you can to optimise amount of
Virtual RAM and disk caching RAM.

If by some chance CPU is the bottleneck, you have have a look at how
you could use two independent servers with message routed between them
over a LAN.

I would watch it running under as much load as you can muster to see
what the bottleneck is.
 
S

saxo123

Hi Roedy,

thanks for your reply. I see what you mean. You would mostly play with
control valves that are outside the software like adding servers or
adding RAM.

Let's say there are 10 consumers and everyone takes messages from a
queue exclusively assigned to itself. Consequently, there are 10
queues in total. Now, each queue is filled with 100 messages each. I
would expect these 1000 messages in total to be processed faster this
way than, say, 1000 messages sitting in a single queue served by a
single consumer. Naively speaking, those 10 consumers should eat up
those 1000 messages 10 times faster. But this is pure mind gambling.
No clue how things would turn out in reality. Maybe the increased
network traffic of 10 consumers polling a queue instead of a single
one would decrease performance...

Do you think it would be useful to make the "message bus" pluggable in
the sense that some SPI is provided so that anyone can increase the
number of queues if considered appropriate? I wonder whether that kind
of pluggable message bus system already exists. Guess it does since
there must be a need for something like this.

Regards, Oliver
 
L

Lew

thanks for your reply. I see what you mean. You would mostly play with
control valves that are outside the software like adding servers or
adding RAM.

Let's say there are 10 consumers and everyone takes messages from a
queue exclusively assigned to itself. Consequently, there are 10
queues in total. Now, each queue is filled with 100 messages each. I
would expect these 1000 messages in total to be processed faster this
way than, say, 1000 messages sitting in a single queue served by a
single consumer. Naively speaking, those 10 consumers should eat up
those 1000 messages 10 times faster. But this is pure mind gambling.

Yes, it is. One wonders why you would expect ten consumers to be faster with ten queues than with one? And what about data integrity?

In some scenarios, e.g., display-processing events for a GUI, element ordermatters. How would you coordinate element order in ten parallel queues?

Ten consumers sipping off a single queue can be blazingly fast, if they grab from a concurrent queue and hare off in their own threads to process the data. You still might face the event-order problem, of course.
No clue how things would turn out in reality. Maybe the increased
network traffic of 10 consumers polling a queue instead of a single
one would decrease performance...

Maybe, but there's no particular reason to think so, or that if it exists such a penalty would be measurable.
Do you think it would be useful to make the "message bus" pluggable in
the sense that some SPI is provided so that anyone can increase the
number of queues if considered appropriate? I wonder whether that kind
of pluggable message bus system already exists. Guess it does since
there must be a need for something like this.

One more consideration, based on many JMS-based systems I've seen in the wild: queues may be a very, very bad implementation for what you aim to accomplish. What alternative architectures have you considered, and why were they inferior?

All your talk of what's faster this, and what's maybe slower than that, is wasted mental energy right now. You don't even evince a basic architecturein your posts, which architecture must perforce conform to the logical problem you aim to solve. Details of what number of them > 1, are so premature at this point. First figure out even whether to use queues at all!

Also based on my experience, you really never need more than one JMS queue in each direction between communicating components. Less than one, surely,but I can't recall needing more than one except in very, very rare, high volume circumstances. We're talking millions of messages per hour per queue, not a piddling ten, before this becomes an issue.
 
S

saxo123

Hi Lew,

thanks for your considerations. Find my comments below. Looks like
this could become an interesting thread :).

Cheers, Oliver
Yes, it is.  One wonders why you would expect ten consumers to be faster with ten queues than with one?  And what about data integrity?

The idea is that there is a queue dedicated to each consumer. So the
consumers knows that every message in the queue is for its own
consumption. With a single queue every consumer needs to peek a
message from the queue, check whether it is for itself, and if so also
take it from the queue.
In some scenarios, e.g., display-processing events for a GUI, element order matters.  How would you coordinate element order in ten parallel queues?

Actually, I always thought that message order is not retained in JMS
since it is an asynchronous system anyway. In my setting recipients
are supposed to establish the right message order themselves anyway
since communication is asnychronous. It's about distributed actors
communicating through process boundaries using JMS.

One more consideration, based on many JMS-based systems I've seen in the wild: queues may be a very, very bad implementation for what you aim to accomplish.  What alternative architectures have you considered, and why were they inferior?

Oops. I'm surprised you think so. JMS is just made for asynchronous
message passing, no? I have thought of other systems like JavaSpaces,
Terracotta, Hazelcast, JBoss Infinispan. But those systems are not
only just about message passing, they provide a whole distributed
programming system which goes beyond what I need. Except for
Infinispan once you are looking for a clustered solution it gets
really expensive. And I'm not sure storing some message for example in
Infinispan, which triggers some event notification that upon receipt
triggers some consumer to take this message from the Infinispan
central space, is a good approach. At least a lot more complicated
than with straight queues as in JMS.
 
R

Roedy Green

Do you think it would be useful to make the "message bus" pluggable in
the sense that some SPI is provided so that anyone can increase the
number of queues if considered appropriate? I wonder whether that kind
of pluggable message bus system already exists. Guess it does since
there must be a need for something like this.

A piece of hardware can only process N messages per second. Fiddling
with queues can improve response time for some class of message but
won't do anything much to change N. To get more oomph, you need more
of whatever the limiting resource is, usually disk arms and
simultaneous disk channels.

In the olden days when disk were small, you usually had banks of them,
each with their own heads. If you spread your data cleverly you could
drastically reduce arm motion. With multiple "channels" you could
even get simultaneous i/o".

Today with RAID, in-disk caches, and SCSI controllers, you can
certainly overlap seeks and i/o. I don't know if you can get
simultaneous transfers though on lower end machines.

It usually boils down mostly to figuring out how to do less disk i/o
by keeping more in RAM or SSD. Nearly all "commercial" (as opposed to
scientific) programs are disk limited.

Nothing beats a test. There so many factors that interact.

A analogous problem would be wondering which sort algorithm to use. So
you design your sort to be pluggable, easy to change your mind later.
In the olden days every sort had a totally different interface. Sun
now forces this sensible approach on everyone.

If you can design your app to be configurable how it works, you can
configure it for whatever works best at a given scale with given
hardware.
 
L

Lew

(e-mail address removed) wrote:

Please attribute citations.
The idea is that there is a queue dedicated to each consumer. So the
consumers knows that every message in the queue is for its own
consumption. With a single queue every consumer needs to peek a
message from the queue, check whether it is for itself, and if so also
take it from the queue.

OK, I pictured fungible consumers, where each performs the same function with received messages so there's no need for peek-and-reject.

You're talking about consumers with different purposes, receiving differentmessages. One might wish separate queues for that for functional reasons,never mind fantasies about optimization.

The performance impact of peek-and-decide before pulling a message will be negligible, I venture to say unnoticeable in the noise of system load. Of course, neither of us knows without performance testing and profiling, but /a priori/ one can predict that the time to look at the current queue message and decide if it's for a particular consumer will be far, far less than the transport and delivery latency. Let us know what your actual measurements tell you, and what the test conditions are. Server load will influenceresults!
Actually, I always thought that message order is not retained in JMS
since it is an asynchronous system anyway. In my setting recipients

Oops. Quite so.
are supposed to establish the right message order themselves anyway
since communication is asnychronous. It's about distributed actors
communicating through process boundaries using JMS.


Oops. I'm surprised you think so. JMS is just made for asynchronous

"Think" so? I observed so.
message passing, no? I have thought of other systems like JavaSpaces,

Yes, and when people layer a synchronous messaging system between components on the same physical node in the same application archive over a JMS queue, one might discover that this is a suboptimal architecture.
Terracotta, Hazelcast, JBoss Infinispan. But those systems are not
only just about message passing, they provide a whole distributed
programming system which goes beyond what I need. Except for
Infinispan once you are looking for a clustered solution it gets
really expensive. And I'm not sure storing some message for example in
Infinispan, which triggers some event notification that upon receipt
triggers some consumer to take this message from the Infinispan
central space, is a good approach. At least a lot more complicated
than with straight queues as in JMS.

This is why I asked about your use case. Not all use cases call for queues, otherwise every single program would always use them. No?

You aren't specific, given that you've only mentioned the solutions you've examined and not the purpose they should serve, but the shape of your search (assuming your analysis of what you need is correct) does indicate that your use case might be appropriate for queues.

In the wild, that is often not the case. On a project where I worked a while back, I replaced a very complex queue-based submodule with a direct method call from the component formerly on one of the queue to a component formerly on the other end, and life got much better.

Not every JMS queue I've seen has been the wrong idea, but many have.

One downside to JMS queues is the deployment, configuration and management overhead. The benefits have to justify the costs.
 
S

saxo123

Hi Lew,
This is why I asked about your use case.  Not all use cases call for queues, otherwise every single program would always use them.  No?

it's about developing a little actor framework (see
http://en.wikipedia.org/wiki/Actor_model) with distributed actors
connected through some means to exchange messages through the wire.
Actors are active objects, e.g. objects that run in their own thread
(to prevent deadlock and other locking issues). The usual approach to
implement this is something like this:

public class MyActor implements Runnable {

BlockingQueue<Message> queue = new LinkedBlockingQueue<Message>();

public static void main(String[] args) {
new Thread(new MyActor()).start();
}

public void run() {
while(true) {
Message message = queue.take();
if(message.getSelector().equals("doWork")) {
doWork();
return;
}
}
}

public void doWork() {
System.out.println("doing my work");
}

}

So messages are exchanged between actors through queues when an actor
wants to notify another actor about something. With distributed actors
those messages would have to be exchanged through the wire. Since
these messages are similar to command objects that are added to queues
my first approach to move this into a distributed setting was to use
distributed queues like JMS queues. Each actor type serves a different
purpose with all the actors working in a collaborative fashion.
You aren't specific, given that you've only mentioned the solutions you've examined and not the purpose they should serve, but the shape of your search (assuming your analysis of what you need is correct) does indicate thatyour use case might be appropriate for queues.

Yes, maybe it has become clearer now :)

Cheers, Oliver
 
J

jlp

[OT] There is an implementation of Actor Pattern ( Scala and Java ) =>
http://akka.io/docs/akka/1.3-RC1/Akka.pdf

Le 17/11/2011 07:59, (e-mail address removed) a écrit :
Hi Lew,
This is why I asked about your use case. Not all use cases call for queues, otherwise every single program would always use them. No?

it's about developing a little actor framework (see
http://en.wikipedia.org/wiki/Actor_model) with distributed actors
connected through some means to exchange messages through the wire.
Actors are active objects, e.g. objects that run in their own thread
(to prevent deadlock and other locking issues). The usual approach to
implement this is something like this:

public class MyActor implements Runnable {

BlockingQueue<Message> queue = new LinkedBlockingQueue<Message>();

public static void main(String[] args) {
new Thread(new MyActor()).start();
}

public void run() {
while(true) {
Message message = queue.take();
if(message.getSelector().equals("doWork")) {
doWork();
return;
}
}
}

public void doWork() {
System.out.println("doing my work");
}

}

So messages are exchanged between actors through queues when an actor
wants to notify another actor about something. With distributed actors
those messages would have to be exchanged through the wire. Since
these messages are similar to command objects that are added to queues
my first approach to move this into a distributed setting was to use
distributed queues like JMS queues. Each actor type serves a different
purpose with all the actors working in a collaborative fashion.
You aren't specific, given that you've only mentioned the solutions you've examined and not the purpose they should serve, but the shape of your search (assuming your analysis of what you need is correct) does indicate that your use case might be appropriate for queues.

Yes, maybe it has become clearer now :)

Cheers, Oliver
 
F

Fredrik Jonson

let's say there were a situation like this: Some producers send
thousands of messages through a JMS system to some thousands
consumers. What is the best approach towards scalability?

Your problem description is too generic to say anything about your
performance. I'd claim a few thousands consumers shouldn't be a problem for
a JMS broker worth its salt.

On the other hand if those consumers are on the same jvm you'll now have
thousands of threads fighting over local resourses.

So the big question is what those consumers do with the payload? Are those
thousands of consumers running on the same system, or jvm even? Then, do
the business logic in the consumers hog network, memory, cpu time, database
resources? Is the system able to cope with that kind of concurrent fight
over system resourses? It might be a problem, it might not, but every system
has its bottlenecks.

What I'm trying to say is that just consuming lots of messages concurrently
by itself isn't hard. It is when you try to do something interesting with
those messages that you're likely to run into performance issuses.
1) 1 JMS publish-subscriber channel with one dispatcher at the end
that takes every msg from the channel and propagates it to the
respective consumer out of those thousand consumers.

Each endpoint consumer can consume directly from the jms. What purpose would
in-between home grown dispatcher have? Consuming messages isn't expensive,
and a dispatcher might turn out to be a bottleneck in the system. I'd call 1
a antipattern.
2) 1 JMS publish-subscriber channel where every consumer peeks a newly
arrived msg and checks whether it is for him and takes it from the
queue if so.

If you choose path 2, you'll want to look into message selectors and
flagging each message with heades so consumers that should not grab the
message doesn't even see the message. You do not want a jms consumer to
reject a dispatched message when the message cannot be redelivered to the
same consumer at a later time. Again a jms antipattern.

I'd caution against message selectors when your consumers can predict
which destination to poll. Using a dedicated destination - queue or topic
name - for each message type is more efficient when the consumers doesn't
have to consume several different message types.
3) Many JMS publish-subscriber channels with one dispatcher for
everyone. I favour this one.

4) Thousand point-to-point connections. Sure not, I guess.

I get the impression that your messages are unique and should only be
consumed once by a single consumer? If that's correct you should use the
point-to-point queue based jms-pattern in 4, rather than using pub-sub with
topics in 3. You'll only want to use topics when the same message should be
dispatched to multiple consumers.

The typical example use for pub-sub is stock tickers, where many clients
(consumers) want the same stock update (message) about many different stocks
(topics). The typical example for point-to-point is a purchase system,
where a new order (message) should be approved (consumed from the approval
queue) once by a single accountant (consumer) before shipping (handed over
to the order shipping queue).
 
S

saxo123

[OT] There is an implementation of Actor Pattern ( Scala and Java ) =>http://akka.io/docs/akka/1.3-RC1/Akka.pdf

Yes, thanks for the link. I know about Akka. It's just about
developing my own little framework for fun and leisure. You learn much
more things by developing something on your own. That's why. It's a
spare time effort any way. I like the idea of combining actors and STM
as in Akka. I actually had this idea myself and then discovered that
the Akka people thought so the same way. IMHO, for STM to make even
really good sense in combination with actors, the STM system should
provide an event notification mechanism when values in the shared
space change. I should mention that to the Akka people.

/Oliver
 
S

saxo123

Hi Fredrik,
Your problem description is too generic to say anything about your
performance. I'd claim a few thousands consumers shouldn't be a problem for
a JMS broker worth its salt.

yes, my first post was not that specific in order to keep the problem
description short. If you have a look at my reply with the MyActor
class, there is much more information provided.
On the other hand if those consumers are on the same jvm you'll now have
thousands of threads fighting over local resourses.

For non-distributed communication between actors that all reside
inside the same vm I won't go with JMS, but with plain JDK1.5
BlockingQueues. What I want to develop is some kind of runtime
environment for actors to live in. A lot of the considerations you
mention below will be left to the developer using the actor platform
since not all problems can be solved in a generic way by the
platform.

What I want is to leave that kind of decisions to the user. For this
to work the user needs to be able to make changes to the way the
queues are set up, which means that the actor platform needs to be
sufficiently configurable for this. What I'm thinking of is to allow
the user to specify the queue to be used for every actor. Maybe, by
default, the single default queue is always used if not specified
differently. So the user can add queues in case it seems appropriate.
Being able to add queues on demand means that message order must not
be assumed to be retained by the recipient, though.
So the big question is what those consumers do with the payload? Are those
thousands of consumers running on the same system, or jvm even?  Then, do
the business logic in the consumers hog network, memory, cpu time, database
resources? Is the system able to cope with that kind of concurrent fight
over system resourses? It might be a problem, it might not, but every system
has its bottlenecks.

What I'm trying to say is that just consuming lots of messages concurrently
by itself isn't hard. It is when you try to do something interesting with
those messages that you're likely to run into performance issuses.

Yes, that's right. I see. I see no way round this than recommending
the user to avoid long-runners. Another approach is to make actors
interruptible. Since a thread cannot be suspended and resumed in Java,
some synchronized interruptCount flag would be increment in case an
asynchronous event arrives and the actor during its execution would
have to poll this flag from time to time.
Each endpoint consumer can consume directly from the jms. What purpose would
in-between home grown dispatcher have? Consuming messages isn't expensive,
and a dispatcher might turn out to be a bottleneck in the system. I'd call 1
a antipattern.

When an actor is busy consuming a message it cannot respond to
external asynchronous events. In case one arrives the thread the
dispatcher runs in would pass the event on to the actor and increment
the actor's interruptCount flag. Once it is done with that, it
continues to dispatch messages.
If you choose path 2, you'll want to look into message selectors and
flagging each message with heades so consumers that should not grab the
message doesn't even see the message. You do not want a jms consumer to
reject a dispatched message when the message cannot be redelivered to the
same consumer at a later time. Again a jms antipattern.

I'd caution against message selectors when your consumers can predict
which destination to poll. Using a dedicated destination - queue or topic
name - for each message type is more efficient when the consumers doesn't
have to consume several different message types.



I get the impression that your messages are unique and should only be
consumed once by a single consumer? If that's correct you should use the
point-to-point queue based jms-pattern in 4, rather than using pub-sub with
topics in 3. You'll only want to use topics when the same message should be
dispatched to multiple consumers.

Thanks Fredrik. What you explain above is very useful.

I will start a new job in some few months where I will also work with
ActiveMQ. Maybe, I keep things in vm-local mode with my little actor
framework using JDK1.5 BlockingQueues instead of JMS queues and will
lateron add an extension to it using JMS queues for distributed actors
once I have learned new things about JMS in my new job. Problem is
some fear that just adding JMS queues to it later will turn out as not
feasible since basic assumptions made in the vm-local first approach
turn out not to be valid any more with when the whole thing get
distributed... That's why I started this thread.

Regards, Oliver
 
F

Fredrik Jonson

my first post was not that specific in order to keep the problem
description short. If you have a look at my reply with the MyActor
class, there is much more information provided.

That message wasn't available at my server when I responded this morning.
One might almost think that the usenet deities also wanted to get a word in
in this thread about asynchronous messaging. ;)
For non-distributed communication between actors that all reside
inside the same vm I won't go with JMS, but with plain JDK1.5
BlockingQueues. What I want to develop is some kind of runtime
environment for actors to live in.

The Actor pattern implies that a message should be consumed with a once and
only once delivery guarantee. Sounds like a great point-to-point candidate
to me.

You know IIRC the freely availabe version of Akka doesn't support message
persistence[0]. So if you use queues for all messages, even in-vm, you'll
actually have a feature in your Actor system that even Akka doesn't
match. Throughput (TPS) will obviously be much worse when compared to a
java BlockingQueue but if your machine or jvm should crash, the messages
that hasn't been acted on yet will still be there after a restart.

[0] I can't find a reference right now, I might be completely wrong here.
Anyone else know more about message persistence in Akka?
[...] Another approach is to make actors interruptible.

Without having the actors also transacted by default you can't interrupt
them reliably. Was the current message content acted on, not acted on, or
partially acted on? Now, if you use jms you can simply close the session,
and the consumer thread will rollback a transacted dispatch, or wait for
completion of non-transacted dispatch.

I hope you get your project of the ground, sounds like fun hacking on a
interesting problem.
 
S

Saxo

Without having the actors also transacted by default you can't interrupt
them reliably. Was the current message content acted on, not acted on, or
partially acted on? Now, if you use jms you can simply close the session,
and the consumer thread will rollback a transacted dispatch, or wait for
completion of non-transacted dispatch.

It can only be done with the assistance of the framework user by being
nice to the system. If an interrupt message arrives, some synchronized
flag is changed. The method being invoked at that time needs to poll
this flag periodically. If it is observed as having changed the user
has to do something himself to get the context stored, invoke the
interrupt handler, restore the context and continue. Not very elegant,
but in Java there is no API to access the threading system in order to
suspend/save context/resume threads.
I hope you get your project of the ground, sounds like fun hacking on a
interesting problem.

All right :). Yes would be fun and you would gain many new insights.

Regards, Oliver
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,904
Latest member
HealthyVisionsCBDPrice

Latest Threads

Top