[Q] synchronize a "mocked" clock in a distributed system

C

Chuck Remes

I've been banging on a problem for a few days now and don't feel any closer to solving it. I'm hoping some of the big brains on the ruby ML can shed some light. Following are a few paragraphs with a brief system overview before I state the problem. I apologize in advance for this question being only tangentially related to Ruby the language. :)

I have written a distributed message passing system (in Ruby!) for doing some mathematical simulation work. Each component of the system does a very specific job. Each component may run on any of 3 distinct machines on a LAN. Components communicate with each other using the 0mq "socket" library to pass messages on well-defined ports that all components know about (hard-coded information instead of a dynamic lookup via a "service directory" mechanism).

The entire system is akin to a distributed state machine. I poke a command into it from the outside and it sets off a cascade of events which in turn generate more events until eventually I have my answer. Some the events have timeouts or other time-based characteristics associated with them. Also, some of the returned data has time-based characteristics (e.g. a timestamp) which impacts the transitioning of the state machine. It's all working quite nicely in real-time.

My problem is mocking out the time source so that I can run simulations in faster than real-time. For example, I may send a request for a data record and give it a 5 second timeout. This works fine when the clock source is the actual operating system, but if I want to run faster than real-time I need to mock the clock out. That is, I want to take a simulation that might run in 4 hours real-time (with lots of waiting or other timer related delays) to run in 20 minutes because 1 second of simulation time is only a fraction of a second in the real world.

This is simple to do for a single component on a single system because I can intercept all calls to Time and replace it with my own source. However, I don't know how to get all of the distributed components (across multiple machines or multiple processes on one machine) to use a mocked clock.

I tried googling around for answers, but all of the papers appear to be concerned with adjusting clock skew across a network where each device already has a local time source. I don't know if those solutions apply here.

Anyone have any bright ideas? Need more information?

cr
 
R

Robert Klemme

My problem is mocking out the time source so that I can run
simulations in faster than real-time. For example, I may send a
request for a data record and give it a 5 second timeout. This works
fine when the clock source is the actual operating system, but if I
want to run faster than real-time I need to mock the clock out. That
is, I want to take a simulation that might run in 4 hours real-time
(with lots of waiting or other timer related delays) to run in 20
minutes because 1 second of simulation time is only a fraction of a
second in the real world.

This is simple to do for a single component on a single system
because I can intercept all calls to Time and replace it with my own
source. However, I don't know how to get all of the distributed
components (across multiple machines or multiple processes on one
machine) to use a mocked clock.

I tried googling around for answers, but all of the papers appear to
be concerned with adjusting clock skew across a network where each
device already has a local time source. I don't know if those
solutions apply here.

Anyone have any bright ideas? Need more information?

A very simplistic solution would be to use DRb and have a centralized
clock. Depending on the number of clients this may of course turn out
as a bottleneck. In that case you would have to devise a more complex
mechanism.

Maybe looking at time protocols such as NTP might give you some
inspiration. Basically you want to solve the same problem, just with a
different time source (I don't think that a mocked NTP server will work
because that needs local clocks with a particular precision.

Another option might be UDP broadcast with the "current time" - if
network latency as precision is good enough. If not, again you need a
more complex mechanism (see time protocols).

Kind regards

robert
 
T

Tony Arcieri

[Note: parts of this message were removed to make it a legal post.]

The entire system is akin to a distributed state machine. I poke a command
into it from the outside and it sets off a cascade of events which in turn
generate more events until eventually I have my answer. Some the events have
timeouts or other time-based characteristics associated with them. Also,
some of the returned data has time-based characteristics (e.g. a timestamp)
which impacts the transitioning of the state machine. It's all working quite
nicely in real-time.

My problem is mocking out the time source so that I can run simulations in
faster than real-time. For example, I may send a request for a data record
and give it a 5 second timeout. This works fine when the clock source is the
actual operating system, but if I want to run faster than real-time I need
to mock the clock out. That is, I want to take a simulation that might run
in 4 hours real-time (with lots of waiting or other timer related delays) to
run in 20 minutes because 1 second of simulation time is only a fraction of
a second in the real world.


It sounds like the way you've written your program is time-dependent, or as
ChucK (the music language) would describe it "strongly timed"

Right off the bat my initial advice would be to eliminate the need for a
central clock in your system and make it fully asynchronous. Creating
"strongly timed" synchronized distributed systems is rather non-trivial.
 
C

Chuck Remes

It sounds like the way you've written your program is time-dependent, or as
ChucK (the music language) would describe it "strongly timed"

Right off the bat my initial advice would be to eliminate the need for a
central clock in your system and make it fully asynchronous. Creating
"strongly timed" synchronized distributed systems is rather non-trivial.

Yes, I suppose it is strongly timed. I didn't realize that was going to be such a problem.

Right now it is completely asynchronous when running across multiple nodes. Each machine's clock is NTP synched so it just does the "right thing" when it runs in real-time. This notion of strongly timed doesn't rear its *ugly* head until I try to replace the clock.

I'm going to try to broadcast a clock pulse or heartbeat to all components. I can set it up so that each component uses the real clock when no clock pulse message has been received but switch over to the mocked clock when it sees the first clock message. Hopefully the delivery latencies don't cause too much trouble by skewing the time between components.

I'll try it and see. Thanks to all for the suggestions.

cr
 
R

Robert Dober

On Fri, Jul 2, 2010 at 12:10 AM, Robert Klemme
A very simplistic solution would be to use DRb and have a centralized clo= ck.
=A0Depending on the number of clients this may of course turn out as a
bottleneck. =A0In that case you would have to devise a more complex mecha=
nism.
Hmm would a messaging based time mocking server be faster? I say that
because that was my idea but I feel that Drb is easier to integrate.
Cheers
R





--=20
The best way to predict the future is to invent it.
-- Alan Kay
 
W

William Rutiser

Chuck said:
Yes, I suppose it is strongly timed. I didn't realize that was going to be such a problem.

Right now it is completely asynchronous when running across multiple nodes. Each machine's clock is NTP synched so it just does the "right thing" when it runs in real-time. This notion of strongly timed doesn't rear its *ugly* head until I try to replace the clock.

I'm going to try to broadcast a clock pulse or heartbeat to all components. I can set it up so that each component uses the real clock when no clock pulse message has been received but switch over to the mocked clock when it sees the first clock message. Hopefully the delivery latencies don't cause too much trouble by skewing the time between components.

I'll try it and see. Thanks to all for the suggestions.

cr
Could you setup a mock NTP time source that supplies "fast" time to its
clients then configure each machine to use the mock NTP and update very
frequently? This may not be practical and would certainly not work if
the machines are being used for anything except your tests.
 
R

Robert Dober

Could you setup a mock NTP time source that supplies "fast" time to its
clients then configure each machine to use the mock NTP and update very
frequently? This may not be practical and would certainly not work if the
machines are being used for anything except your tests.
I have heared that being killed by a sysadmin is a terrible fate ;)

Cheers
R.
 
C

Chuck Remes

I have heared that being killed by a sysadmin is a terrible fate ;)

The idea of using a hacked NTP daemon to speed up the clocks in not feasible. Interesting idea though...

cr
 
T

Tony Arcieri

[Note: parts of this message were removed to make it a legal post.]

The idea of using a hacked NTP daemon to speed up the clocks in not
feasible. Interesting idea though...

Why can't the "central time" be maintained by whatever process is scattering
work to your distributed nodes, and just asynchronously included in the
messages for use whenever your workers get around to processing them?
 
C

Chuck Remes

Why can't the "central time" be maintained by whatever process is scattering
work to your distributed nodes, and just asynchronously included in the
messages for use whenever your workers get around to processing them?

Because there is no centralized server that all messages, data or control must pass through.

cr
 
T

Tony Arcieri

[Note: parts of this message were removed to make it a legal post.]

Because there is no centralized server that all messages, data or control
must pass through.

If your system is fully asynchronous and there's no central data source, how
is it possible for nodes to synchronize to a central clock? That makes
absolutely no sense.
 
C

Chuck Remes

If your system is fully asynchronous and there's no central data source, how
is it possible for nodes to synchronize to a central clock? That makes
absolutely no sense.

I wrote a long email describing why I thought I was right, but I kept coming back to your earlier question about a centralized data source. The problem I have with my data source is that the documents within it have different time granularities for the data. For example, some documents represent data aggregated over 1m, 1 day or 1 week. Since documents of each time granularity may be requested by various processes, I didn't see how I could use them as a source for the mock clock.

And then it hit me. I could have a mock clock process that subscribes to all of those data sources and receives all of those messages. The mock clock should *only* pay attention to the document data with the smallest time granularity for setting the clock and ignore the rest.

So yes, you are right. I *do* have a central data source that I can use to set the clock. I just didn't see it before.

Thanks for pressing me on this. It forced me to really figure it out.

cr
 
T

Tony Arcieri

[Note: parts of this message were removed to make it a legal post.]

Cool, glad I could help

 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,534
Members
45,007
Latest member
obedient dusk

Latest Threads

Top