One architectural suggestion: rather than extending/overriding Net::HTTP,
make your cache a separate object.
I'm intending to make the cache itself a separate object.
However, I am planning to extend Net::HTTP#get to use the cache,
because, in theory, with HTTP/1.1 semantics on the cache, it will make
no difference -- you could be getting a cached response as it is...
Honestly, I think that integrating it with get makes the most sense
there.
Then, Net::Cache#get can return an object from its cache (which is just an
instance variable of Net::Cache), and also contains instances of object(s)
which do the fetching from remote hosts when there is a cache miss.
I think there are a number of advantages in this approach:
- no messing with an existing foundation class
Consider me ambitious, but I'd like to see code like this make it as far
as inclusion with ruby some day. ;-)
- the potential to have multiple methods for retrieving objects
(e.g. Net::FTP as well as Net::HTTP)
Those are different objects ;-)
- a clear division of responsibility between the cache and the
object retrieval protocol
That's hard to do and stay HTTP/1.1 compliant. The caching requires
metadata that doesn't exist for FTP nor filesystem objects. . . Some
basic support would be possible, but at least some metadata object would
be required. I'd say that the HTTP headers are that data at the moment.
Even so, I'll keep it abstract enough to not invite duplication.
With some care, you should be able to make your cache thread-safe: if a
request for object X comes in while the cache is already fetching that
object, it could wait until the retrieval is complete. The cache can then
sit behind DRb, for example, so it can be accessed by multiple processes
simultaneously. Equally, if you get a request for object Y while a fetch for
object X is taking place, you can perform a parallel fetch in another
thread.
I intend to do that in my second version. That's one thing that most
caching systems don't do that they should.
I think it would be much easier to deal with these sort of threading issues
when the cache is a separate object from the protocol.
I don't think it'll be too difficult either way. The cache will be
separate, and I'm thinking of a more-or-less callback system for things
to be added to the cache -- an object can register that it will be
adding something to the cache when retrieval is complete, but I think
having the cache /do/ as little work as possible (such as the actual
retrieval) is more sane. Partly, this is because, theoretically, one
could be caching outbound requests as well, ala Apache's mod_cache, and
if they're dynamically generated, having a callback system would let
that be simpler than having the cache do the retrieval.
In fact, I think I would break it up a bit more: have the raw cache as one
object (very simple: just 'put' and 'fetch' methods into a hash, but it has
its own semaphore for protecting concurrent accesses), and a cache manager
which takes the incoming 'get' requests, checks the cache, and if necessary
performs the actual fetch before returning the object.
Basically, I think I'll let Net::HTTP#get (and other methods) be
extended to use the cache, effectively making them the cache manager,
and the base cache will be more or less four methods:
get
put
valid?
delete
Thank you /very/ much for the insight, by the way. My idea's
implementation is getting cleaner in my head as I talk about this.
Ari