Proxying downloads

M

Martin Marcher

Hello,

more a recipe question. I'm working on a proxy that will download a
file for a client. The thing that doesn't yield problems is:

Alice (Client)
Bob (Client)
Sam (Server)

1 Alice asks Sam for "foobar.iso"
2 Sam can't find "foobar.iso" in "cachedir"
3 Sam requests "foobar.iso" from the uplink
4 Sam now saves each chunk received to "cachedir/foobar.iso"
5 At the same time Sam forwards each chunk to Alice.

But I can't figure out how I would solve the following:

1 Alice asks Sam for "foobar.iso"
2 Sam can't find "foobar.iso" in "cachedir"
3 Sam requests "foobar.iso" from uplink
4 Sam saves and forwards to Alice
5 At about 30 % of the download Bob asks Sam for "foobar.iso"
6 How do I serve Bob now?

Now because the internal link is _a lot_ faster than the uplink Bob
will probably reach the end of (the local) "foobar.iso" before Sam has
received "foobar.iso" in total from uplink. So Bob will end up with a
incomplete file...

How do I solve that. The already downloaded data should of course be
served internally.

The solutions I think of are
* Some kind of subscriber list for the file in question
* That is serve internally and if the state of "foobar.iso" is in
progress switch to receiving chunk directly from Sam as it comes down
the link
* How would I realize this switch from internal serving to pass thru
of chunks?

* Send an acknowledge (lie to the client that we have this file in
the cache) wait until it's finished and then serve the file from the
internal cache)
* This could lead to timeouts for very large files, at least I think so

* Forget about all of it and just pass thru from uplink, with a new
request, as long as files are in progress. This would in the worst
case download the file n times where n is the number of clients.
* I guess that's the easiest one but also the least desirable solution.

I hope I explained my problem somehow understandable.

any hints are welcome
thanks
martin
 
J

Jeff

You use a temp directory to store the file while downloading, then
move it to the cache so the addition of the complete file is atomic.
The file name of the temp file should be checked to validate that you
don't overwrite another process' download.

Currently downloading urls should be registered with the server
process (a simple list or set would work). New requests should be
checked against that; if there is a matching url in there, the process
must wait until that download is finished and that file should be
delivered to both Alice and Bob.

You need to store the local file path and the url it was downloaded
from and checking against that when a request is made; there might be
two foobar.iso files on the Internet or the network, and they may be
different (such as in differently versioned directories).
 
M

Martin Sand Christensen

But I can't figure out how I would solve the following:
1 Alice asks Sam for "foobar.iso"
2 Sam can't find "foobar.iso" in "cachedir"
3 Sam requests "foobar.iso" from uplink
4 Sam saves and forwards to Alice
5 At about 30 % of the download Bob asks Sam for "foobar.iso"
6 How do I serve Bob now?

Let every file in your download cache be represented by a Python object.
Instead of streaming the file directly to the clients, you can stream
the objects. The object will know if the file it represents has finished
downloading or not, where the file is located etc. This way you can
also, for the sake of persistence, keep partially downloaded files
separate from the completely downloaded files, as per a previous
suggestion, so that you won't start serving half files after a crash,
and it'll be completely transparent in all code except for your proxy
file objects.

Martin
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top