Writing multiple files with one stream

D

davidjdoherty

Hi,
I have a bit of a performance problem. I have 5 live servers that are
synchonised with the same files. When I upload I am uploading the same
files to each of the 5 servers. Regularly, I need to upload a large
number (about 50) of small files (3k) to all 5 servers. 3 of the 5
servers are very quick and it only takes a few seconds, but for the
slow severs (based in Hong Kong and Auz - I'm in the UK) it can take 30
mins. This is because my program is opening up a new stream for each
file, and this connection time is what is causing the slow transfer.

Is there anyway to only open one stream, but write multiple files
across to it?

I thought about using the java ZIP API to zip the files, upload the
file, and then unzip them. What is the performance like when you unzip
a folder sitting in a remote directory?

Is there an easy way?

Cheers,
Dave
 
O

Oliver Wong

Hi,
I have a bit of a performance problem. I have 5 live servers that are
synchonised with the same files. When I upload I am uploading the same
files to each of the 5 servers. Regularly, I need to upload a large
number (about 50) of small files (3k) to all 5 servers. 3 of the 5
servers are very quick and it only takes a few seconds, but for the
slow severs (based in Hong Kong and Auz - I'm in the UK) it can take 30
mins. This is because my program is opening up a new stream for each
file, and this connection time is what is causing the slow transfer.

This doesn't make sense to me. You first claim that the slow transfers
are due to the servers themselves being slow, then you later claim that the
slow transfers are due to the opening up a new stream for each file. If the
problem were actually the latter case, then all servers would be perceived
to be equally slow. So which is it?
Is there anyway to only open one stream, but write multiple files
across to it?

You can write anything you want to a stream; the question is whether the
server on the other side knows what to do with the data in that stream.
I thought about using the java ZIP API to zip the files, upload the
file, and then unzip them. What is the performance like when you unzip
a folder sitting in a remote directory?

The performance of a program depends (among other things) on the
computer running that program. Typically, zipping a file, sending it, and
unzipping it, is faster than sending the uncompressed file, but it depends
on the compressability of the files involved.
Is there an easy way?

It's not clear what options are available to you. You haven't specified,
for example, what control you have over the five servers.

- Oliver
 
C

Chris Uppal

I have a bit of a performance problem. I have 5 live servers that are
synchonised with the same files. When I upload I am uploading the same
files to each of the 5 servers. Regularly, I need to upload a large
number (about 50) of small files (3k) to all 5 servers. 3 of the 5
servers are very quick and it only takes a few seconds, but for the
slow severs (based in Hong Kong and Auz - I'm in the UK) it can take 30
mins. This is because my program is opening up a new stream for each
file, and this connection time is what is causing the slow transfer.

How are you uploading the files, and what sort of access do you have to the
servers ? Can you execute a program on them ? If not then you are probably
hosed...

-- chris
 
D

davidjdoherty

I have access to the servers via 5 mapped network drives on windows. So
when I upload the files I'm just opening up file output streams that
refer to "S:\filename", etc...

I could write a server side program that allows a client to open a
connection and send multiple files over that connection, but that would
involve me convincing other people to allow me this sort of access,
which could be arduous. I was hoping for a quicker (or lazier fix).

Oliver: I referred to two of the servers as slow because they are
located fair away. The servers themselves are fast, but performing
operations such as opening a folder through my mapped network drive
seems slow due to the overhead of creating a connection to a location
that is located very far away from me. (Two of the other servers are
located onsite, and the third is just a couple miles away so operations
are very noticeably quicker).
 
O

Oliver Wong

I have access to the servers via 5 mapped network drives on windows. So
when I upload the files I'm just opening up file output streams that
refer to "S:\filename", etc...

I could write a server side program that allows a client to open a
connection and send multiple files over that connection, but that would
involve me convincing other people to allow me this sort of access,
which could be arduous. I was hoping for a quicker (or lazier fix).

Oliver: I referred to two of the servers as slow because they are
located fair away. The servers themselves are fast, but performing
operations such as opening a folder through my mapped network drive
seems slow due to the overhead of creating a connection to a location
that is located very far away from me. (Two of the other servers are
located onsite, and the third is just a couple miles away so operations
are very noticeably quicker).

I wish you had mentioned the mapped network drivers earlier. It makes a big
difference, and is probably not what mostp people envisioned from your
original description. Zipping is make things worse in your situation. Let's
say a file is 5 MB when uncompressed, and 1 MB when compressed.

The "Don't zip it" solution involves you sending the 5MB file to the remote
folder. And that's it. So total bandwidth used is 5MB.

The "Zip it" solution involves you sending the 1MB file to the remote
folder. Then, you re-read that 1MB back locally. Then you write the
uncompressed version, which is 5MB, back to the server. Total bandwidth used
is 7MB.

The term "overhead" is usually used to mean "additional costs due to the
strategy I'm using". In your case, these are not "additional" costs, but
actually the MAIN cost of the task you're trying to perform; namely, sending
files across a network, or getting a list of files in a remote folder.

- Oliver
 
D

davidjdoherty

Yeah, I got the feeling that I wouldn't be able to zip it and have the
remote machine unzip it for me. I was hoping that there might have been
someway to do it. I guess I'll have to see if I can run a server on
that remote machines.

Sorry if I was using the term overhead incorrectly. I was using the
term in its general meaning and not its specific meaning in terms of
data communication over networks. If you want to create a syntax to
differentiate between different meanings of the same word I am happy to
use it. Maybe something like: overhead[general], overhead[data
communication over networks]. This way we can avoid pedantic
discussions in the future. Sorry to be so rude, but if you understood
me anyway, then what was the point of bringing it up?

Thanks for the help,
Dave
 
C

Chris Uppal

I have access to the servers via 5 mapped network drives on windows. So
when I upload the files I'm just opening up file output streams that
refer to "S:\filename", etc...

Hmm. But how is the connection made, is the remote "drive" a WebDav drive, or
an SMB/CIFS connection (a normal Windows/SAMBA shared drive), or an FTP
connection, or what ? And is the connection layered over something like SSL ?
I ask because a setup+transfer time of over half a minute per 3K file doesn't
make any sense at all. And it makes even less sense if the connection is
layered over SSL, since the connection then should be established only once (at
the TCP level) and the other stuff all mutliplexed over that. It sounds as if
you might have an undiagnosed problem with your network, in which case the
right thing is to fix that.

I could write a server side program that allows a client to open a
connection and send multiple files over that connection, but that would
involve me convincing other people to allow me this sort of access,
which could be arduous. I was hoping for a quicker (or lazier fix).

You might find it easier to persuade the powers-that-be if their alternative is
for /them/ to do some work to fix the network problem ;-)

It might be worth checking that you are not attempting to transport all 50
files simultaneously (which could be overloading something). Other than that
(and assuming you are stuck with the network as is) there is no way that you
can fix this unilaterally.

-- chris
 
O

Oliver Wong

Sorry to be so rude, but if you understood
me anyway, then what was the point of bringing it up?

In the particular message you are referring to, I had enough context to
"guess at" what you really meant. However, if I hadn't pointed out my
confusion to you, you might have, in a future message, used words
incorrectly again, but this time, did NOT provide sufficient context for
people to guess at what you really meant, and thus end up confusing
everybody. And when people are confused about what you're saying, it's more
difficult for them to help you. From my point of view, I was doing you a
favour, by making it easier for people to understand you, and thus making it
easier for you to get the help you want.

A similar thing happened to me a while ago. I was talking about
encryption channels, and I referred to the stuff through which the
communication happened as "mediums". Someone then pointed out to me that
"mediums" is the plural for the term which refers to a person who
communicates with the dead, and that I probably meant "media". Yeah, it was
a bit embarassing for me, but he was right: I had used the wrong word.
Because of the context (encryption channels), it was clear what I meant, but
in another context, it might not have been so clear. So I appreciated him
correcting me.

I'm not saying you HAVE to appreciate what I did, or that you "owe" me a
favor in return or anything. I'm just trying to explain how I saw the
situation, and that I didn't mean any harm.

- Oliver
 
C

Chris Uppal

I said:
And it makes even less
sense if the connection is layered over SSL, since the connection then
should be established only once (at the TCP level)

In the interest of precision (and since we seem to have got into a discussion
of clear language[*]) I should correct myself. What I meant was "if the
connection is /tunnelled/ over SSL [...etc]" Layering would imply a new SSL
connection for each real connection; tunnelling (as in a VPN) would not.

-- chris

([*] which always interests me)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,011
Latest member
AjaUqq1950

Latest Threads

Top