OT: squid-type cache for XML?

C

CptDondo

OK, this is OT for this group, but I really have no idea where to post this.

I am working on a project where a 'client' periodically queries a number
of 'servers'. The exchanges are done using XML.

There is one client and an awful lot of servers (hundreds), and
bandwidth is limited. It can take hours for the client to query all of
the servers in round-robin fashion. (We can't use exception reporting
or have the servers report for technical reasons.)

My solution is to develop intermediate proxy-cache boxes, which would
query servers in their subnet and cache the results. The client then
would only need to query the proxies.

This seems like a pretty simple idea, and there solutions out there for
html proxies doing this sort of thing.

Is anyone aware of anything out there for xml queries?
 
B

Benjamin Niemann

Hello,
OK, this is OT for this group, but I really have no idea where to post
this.

I am working on a project where a 'client' periodically queries a number
of 'servers'. The exchanges are done using XML.

There is one client and an awful lot of servers (hundreds), and
bandwidth is limited. It can take hours for the client to query all of
the servers in round-robin fashion. (We can't use exception reporting
or have the servers report for technical reasons.)

My solution is to develop intermediate proxy-cache boxes, which would
query servers in their subnet and cache the results. The client then
would only need to query the proxies.

This seems like a pretty simple idea, and there solutions out there for
html proxies doing this sort of thing.

Is anyone aware of anything out there for xml queries?

Proxies like squid work on the protocol level (HTTP) - they do not care what
kind of data is being transferred.
If you are using HTTP to fetch the XML data, then you should be able to use
any generic HTTP proxy including squid.
Just make sure that the data is cachable at all: proper HTTP headers, data
is fetched using GET, not POST...
You could install cronjobs on or near the proxy servers, which pull the data
(via the proxy) and just drop it - to make sure the data is in the cache,
when your client comes around. A simple bash script with lots
of 'wget -O /dev/null http://...' might be sufficient.

HTH
 
C

CptDondo

Benjamin said:
Proxies like squid work on the protocol level (HTTP) - they do not care what
kind of data is being transferred.
If you are using HTTP to fetch the XML data, then you should be able to use
any generic HTTP proxy including squid.
Just make sure that the data is cachable at all: proper HTTP headers, data
is fetched using GET, not POST...
You could install cronjobs on or near the proxy servers, which pull the data
(via the proxy) and just drop it - to make sure the data is in the cache,
when your client comes around. A simple bash script with lots
of 'wget -O /dev/null http://...' might be sufficient.

That's a neat idea....

I don't control the client so I'll have to see if the XML-over-HTTP will
work, but at least I can talk intelligently to my (human) client about
the issue.... :)

--Yan
 
A

Andy Dingley

CptDondo said:
It can take hours for the client to query all of
the servers in round-robin fashion.
My solution is to develop intermediate proxy-cache boxes,

This isn't proxying (and so I don't think Squid will help).

If you had a squillion clients querying one server with an identical
request, then you could cache that. What your problem is though is one
client querying lots of endpoints -- effectively many totally separate
requests. You can't cache that - even worse is that you might cache it,
and all "servers" appeared to report the same result!

You might be able to proxy this by setting up proxies (custom-written
but simple) somewhere that had good bandwidth to the servers, then
reported their results in some "denser" fashion to the client. This
isn't a transparent proxy though.

Chances are that you could even do this in-house, maybe even by just
re-writing the client to be multi-threaded. Is it really bandwidth
that's the problem here, or latency?
 
T

Toby Inkster

CptDondo said:
I am working on a project where a 'client' periodically queries a number
of 'servers'. The exchanges are done using XML.

There is one client and an awful lot of servers (hundreds), and
bandwidth is limited. It can take hours for the client to query all of
the servers in round-robin fashion.

XML is fairly bulky. Have you thought of compressing the entire exchange
with gzip? Ought to reduce bandwidth by about 60% or so.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,482
Members
44,901
Latest member
Noble71S45

Latest Threads

Top