client to upload big files via https and get progress info


N

News123

Hi,

I'd like to perform huge file uploads via https.
I'd like to make sure,
- that I can obtain upload progress info (sometimes the nw is very slow)
- that (if the file exceeds a certain size) I don't have to
read the entire file into RAM.

I found Active states recipe 146306, which constructs the whole
multipart message first in RAM and sends it then in one chunk.


I found a server side solutions, that will write out the data file chunk
wise ( http://webpython.codepoint.net/mod_python_publisher_big_file_upload
)



If I just wanted to have progress info, then I could probably
just split line 16 of Active State's recipe ( h.send(body) )
into multiple send, right?

chunksize = 1024
for i in range(0,len(body),chunksize):
h.send(body[i:i+chunksize])
show_progressinfo()


But how could I create body step by step?
I wouldn't know the content-length up front?

thanks in advance



N
 
Ad

Advertisements

A

Aahz

I'd like to perform huge file uploads via https.
I'd like to make sure,
- that I can obtain upload progress info (sometimes the nw is very slow)
- that (if the file exceeds a certain size) I don't have to
read the entire file into RAM.

Based on my experience with this, you really need to send multiple
requests (i.e. "chunking"). There are ways around this (you can look
into curl's resumable uploads), but you will need to maintain state no
matter what, and I think that chunking is the best/simplest.
 
J

James Mills

Hi,

I'd like to perform huge file uploads via https.
I'd like to make sure,
- that I can obtain upload progress info (sometimes the nw is very slow)
- that (if the file exceeds a certain size) I don't have to
 read the entire file into RAM.

I found Active states recipe 146306, which constructs the whole
multipart message first in RAM and sends it then in one chunk.


I found a server side solutions, that will write out the data file chunk
wise ( http://webpython.codepoint.net/mod_python_publisher_big_file_upload
)



If I just wanted to have progress info, then I could probably
just split line 16 of Active State's recipe ( h.send(body) )
into multiple send, right?

chunksize = 1024
for i in range(0,len(body),chunksize):
   h.send(body[i:i+chunksize])
   show_progressinfo()


But how could I create body step by step?
I wouldn't know the content-length up front?

thanks in advance

My suggestion is to find some tools that can
send multiple chucks of data. A non-blocking
i/o library/tool might be useful here (eg: twisted or similar).

cheers
James
 
N

News123

Hi Aaaz,
Based on my experience with this, you really need to send multiple
requests (i.e. "chunking"). There are ways around this (you can look
into curl's resumable uploads), but you will need to maintain state no
matter what, and I think that chunking is the best/simplest.
I agree I need chunking. (the question is just on which level of the
protocol)

I just don't know how to make a chunkwise file upload or what library is
best.

Can you recommend any libraries or do you have a link to an example?


I'd like to avoid to make separate https post requests for the chunks
(at least if the underlying module does NOT support keep-alive connections)


I made some tests with high level chunking (separate sequential https
post requests).
What I noticed is a rather high penalty in data throughput.
The reason is probably, that each request makes its own https connection
and that either the NW driver or the TCP/IP stack doesn't allocate
enough band width to my request.

Therefore I'd like to do the chunking on a 'lower' level.
One option would be to have a https module, which supports keep-alive,

the other would be to have a library, which creates a http post body
chunk by chunk.


What do others do for huge file uploads
The uploader might be connected via ethernet, WLAN, UMTS, EDGE, GPRS. )

N
 
S

Sean DiZazzo

Hi Aaaz,



I agree I need  chunking. (the question is just on which level of the
protocol)

I just don't know how to make a chunkwise file upload or what library is
best.

Can you recommend any libraries or do you have a link to an example?

I'd like to avoid to make separate https post requests for the chunks
(at least if the underlying module does NOT support keep-alive connections)

I made some tests with high level chunking (separate sequential https
post requests).
What I noticed is a rather high penalty in data throughput.
The reason is probably, that each request makes its own https connection
and that either the NW driver or the TCP/IP stack doesn't allocate
enough band width to my request.

Therefore I'd like to do the chunking on a 'lower' level.
One option would be to have a https module, which supports keep-alive,

the other would be  to have a library, which creates a http post body
chunk by chunk.

What do others do for huge file uploads
The uploader might be connected via ethernet, WLAN, UMTS, EDGE, GPRS. )

N

You could also just send the file in one big chunk and give yourself
another avenue to read the size of the file on the server. Maybe a
webservice that you call with the name of the file that returns it's
percent complete, or it could just return bytes on disk and you do the
math on the client side. Then you just forget about the transfer and
query the file size whenever you want to know...or on a schedule.

~Sean
 
S

Sean DiZazzo

You could also just send the file in one big chunk and give yourself
another avenue to read the size of the file on the server.  Maybe a
webservice that you call with the name of the file that returns it's
percent complete, or it could just return bytes on disk and you do the
math on the client side.  Then you just forget about the transfer and
query the file size whenever you want to know...or on a schedule.

~Sean

oops...that doesn't help with the other requirements. My suggestion
is to not use https. I don't think it was created to move around
large pieces of data. Lots of small pieces rather. SFTP?
 
Ad

Advertisements

J

J.O. Aho

News123 said:
What do others do for huge file uploads
The uploader might be connected via ethernet, WLAN, UMTS, EDGE, GPRS. )

Those cases where I have had to move big files it's been scp on those cases
where you just have to push a new file, in cases where it's a question of
keeping two directories synced, then it's rsync over ssh.
The later one I have never done in python.
 
N

News123

Hi Sean,




oops...that doesn't help with the other requirements. My suggestion
is to not use https. I don't think it was created to move around
large pieces of data. Lots of small pieces rather. SFTP?


I had to check, but I guess sftp is not exactly suitable for my usecase.

My problem
- the whole communication is to be intended to work like a drop box.
- one can upload files
- one can not see, what one has uploaded before
- no way to accidentally overwrite a previous upload, etc.
- I don't know enough about sftp servers to know how I could configure
it to act as a drop box.


That's much easier to hide behind an https server than behind an out of
the box sftp server.



N
 
N

News123

Hi James,

James said:
My suggestion is to find some tools that can
send multiple chucks of data. A non-blocking
i/o library/tool might be useful here (eg: twisted or similar).

I never used twisted so far.
Perhaps the time to look at it.


bye


N
 
Ad

Advertisements

N

News123

Hi J,


J.O. Aho said:
Those cases where I have had to move big files it's been scp on those cases
where you just have to push a new file, in cases where it's a question of
keeping two directories synced, then it's rsync over ssh.
The later one I have never done in python.


I agree. From home this is also what I do.
scp / rsync.


However I'd like to use https, as http/https are the two ports, that are
almost everywhere accessible (eve with proxies / firewalls, etc.)



N
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top