How to best update remote compressed, encrypted archives incrementally?

Discussion in 'Python' started by robert, Mar 10, 2006.

  1. robert

    robert Guest

    Hello,

    I want to put (incrementally) changed/new files from a big file tree
    "directly,compressed and password-only-encrypted" to a remote backup
    server incrementally via FTP,SFTP or DAV.... At best within a closed
    algorithm inside Python without extra shell tools.
    (The method should work with any protocol which allows somehow read,
    write & seek to a remote file.)
    On the server and the transmission line there should never be
    unencrypted data.

    Usually one would create a big archive, then compress, then encrypt
    (e.g. with gpg -c file) , then transfer. However for that method you
    need to have big free temp disk space and most costing: transfer always
    the complete archive.
    With proved block-file encryption methods like GPG I don't get the
    flexibility needed for my task, I guess?

    ZIP2 format allows encryption (Is this ZIP encryption method supported
    with Python somehow/basically?). Somehow it would be possible to
    navigate in a remote ZIP (e.g. over FTP) . But ZIP encryption is also
    known to be very weak and can be cracked within some hours computing
    time, at least when every file uses the same password.

    Another method would be to produce slice files: Create inremental
    TAR/ZIP archives, encrypt them locally with "gpg -c" and put them as
    different files. Still a fragile setup, which allows only rough control,
    needs a common archive time stamp (comparing individual file attributes
    is not possible), and needs external tools.

    Very nice would be a method which can directly compare agains and update
    a single consistent file like
    ftp://..../archive.zip.gpg

    Is something like this possible?

    Robert
     
    robert, Mar 10, 2006
    #1
    1. Advertising

  2. On Fri, 10 Mar 2006 15:13:07 +0100, robert wrote:

    > Hello,
    >
    > I want to put (incrementally) changed/new files from a big file tree
    > "directly,compressed and password-only-encrypted" to a remote backup
    > server incrementally via FTP,SFTP or DAV.... At best within a closed
    > algorithm inside Python without extra shell tools.


    What do you mean by "closed algorithm"?

    The only thing I can think of is you mean a secret algorithm, one which
    nobody but yourself will know. So let's get this straight... you are
    asking a public newsgroup dedicated to an open-source language for
    somebody to tell you a secret algorithm that only you will know?

    Please tell me I've misunderstood.


    > (The method should work with any protocol which allows somehow read,
    > write & seek to a remote file.)
    > On the server and the transmission line there should never be
    > unencrypted data.


    Break the job into multiple pieces. Your task is:

    - transmit information to the remote server;

    Can you use SSH for that? SSH will use industrial strength encryption,
    likely better than anything you can create.

    - you want to update the files at the other end;

    Sounds like a job for any number of already existing technologies, like
    rsync (which, by the way, already uses ssh for the encrypted transmission
    of data).



    --
    Steven.
     
    Steven D'Aprano, Mar 11, 2006
    #2
    1. Advertising

  3. robert

    robert Guest

    Steven D'Aprano wrote:
    > On Fri, 10 Mar 2006 15:13:07 +0100, robert wrote:
    >
    >
    >>Hello,
    >>
    >>I want to put (incrementally) changed/new files from a big file tree
    >>"directly,compressed and password-only-encrypted" to a remote backup
    >>server incrementally via FTP,SFTP or DAV.... At best within a closed
    >>algorithm inside Python without extra shell tools.

    >
    >
    > What do you mean by "closed algorithm"?
    >
    > The only thing I can think of is you mean a secret algorithm, one which
    > nobody but yourself will know. So let's get this straight... you are
    > asking a public newsgroup dedicated to an open-source language for
    > somebody to tell you a secret algorithm that only you will know?
    >
    > Please tell me I've misunderstood.


    no. I meant it terms of 'cohesive' : A Python solution without a lot of
    other tools. (Only the password has to be secret)

    >>(The method should work with any protocol which allows somehow read,
    >>write & seek to a remote file.)
    >>On the server and the transmission line there should never be
    >>unencrypted data.

    >
    >
    > Break the job into multiple pieces. Your task is:
    >
    > - transmit information to the remote server;
    >
    > Can you use SSH for that? SSH will use industrial strength encryption,
    > likely better than anything you can create.


    Yes, sftp (=SSH) or ftp with TSL (=SSL) are good protocols. They can
    also read/navigate in a remote fila and append-to-file. But how about
    incremental+encrypted?

    > - you want to update the files at the other end;
    >
    > Sounds like a job for any number of already existing technologies, like
    > rsync (which, by the way, already uses ssh for the encrypted transmission
    > of data).


    As far as I know, rsync cannot update compressed+encrypted into an
    existing file(set) ?
    I any case with rsync I would have to have a duplicate of the backup
    file geometry on the local machine (consuming another magnitude of the
    file stuff itself) ?

    Thats why I ask: how to get all these tasks into a cohesive encrypted
    backup solution not wasting disk space and network bandwidth?

    Robert
     
    robert, Mar 11, 2006
    #3
  4. On Sat, 11 Mar 2006 11:46:24 +0100, robert wrote:

    >> Sounds like a job for any number of already existing technologies, like
    >> rsync (which, by the way, already uses ssh for the encrypted transmission
    >> of data).

    >
    > As far as I know, rsync cannot update compressed+encrypted into an
    > existing file(set) ?
    > I any case with rsync I would have to have a duplicate of the backup
    > file geometry on the local machine (consuming another magnitude of the
    > file stuff itself) ?


    Let me see if I understand you.

    On the remote machine, you have one large file, which is compressed and
    encrypted. Call the large file "Archive". Archive is made up of a number
    of virtual files, call them A, B, ... Z. Think of Archive as a compressed
    and encrypted tar file.

    On the local machine, you have some, but not all, of those smaller
    files, let's say B, C, D, and E. You want to modify those smaller files,
    compress them, encrypt them, transmit them to the remote machine, and
    insert them in Archive, replacing the existing B, C, D and E.

    Is that correct?

    > Thats why I ask: how to get all these tasks into a cohesive encrypted
    > backup solution not wasting disk space and network bandwidth?


    What's your budget for developing this solution? $100? $1000? $10,000?
    Stop me when I get close. Remember, your time is money, and if you are a
    developer, every hour you spend on this is costing your employer anything
    from AUD$25 to AUD$150. (Of course, if you are working for yourself, you
    might value your time as Free.)

    If you have an unlimited budget, you can probably create a solution to do
    this, keeping in mind that compressed/encrypted and modify-in-place
    *rarely* go together.

    If you have a lower budget, I'd suggest you drop the "single file"
    requirement. Hard disks are cheap, less than an Australian dollar a
    gigabyte, so don't get trapped into the false economy of spending $100 of
    developer time to save a gigabyte of data. Using multiple files makes it
    *much* simpler to modify-in-place: you simply replace the modified file.
    Of course the individual files can be compressed and encrypted, or you can
    use a compressed/encrypted file system.

    Lastly, have you considered that your attempted solution is completely the
    wrong way to solve the problem? If you explain _what_ you are wanting to
    do, rather than _how_ you want to do it, perhaps there is a better way.


    --
    Steven.
     
    Steven D'Aprano, Mar 11, 2006
    #4
  5. robert

    robert Guest

    Steven D'Aprano wrote:


    > Let me see if I understand you.
    >
    > On the remote machine, you have one large file, which is compressed and
    > encrypted. Call the large file "Archive". Archive is made up of a number
    > of virtual files, call them A, B, ... Z. Think of Archive as a compressed
    > and encrypted tar file.
    >
    > On the local machine, you have some, but not all, of those smaller
    > files, let's say B, C, D, and E. You want to modify those smaller files,
    > compress them, encrypt them, transmit them to the remote machine, and
    > insert them in Archive, replacing the existing B, C, D and E.
    >
    > Is that correct?


    Yes, that is it. In addition a possiblity for (fast) comparison of
    individual files would be optimal.

    >>Thats why I ask: how to get all these tasks into a cohesive encrypted
    >>backup solution not wasting disk space and network bandwidth?

    >
    > What's your budget for developing this solution? $100? $1000? $10,000?
    > Stop me when I get close. Remember, your time is money, and if you are a
    > developer, every hour you spend on this is costing your employer anything
    > from AUD$25 to AUD$150. (Of course, if you are working for yourself, you
    > might value your time as Free.)
    >
    > If you have an unlimited budget, you can probably create a solution to do
    > this, keeping in mind that compressed/encrypted and modify-in-place
    > *rarely* go together.
    >
    > If you have a lower budget, I'd suggest you drop the "single file"
    > requirement. Hard disks are cheap, less than an Australian dollar a
    > gigabyte, so don't get trapped into the false economy of spending $100 of
    > developer time to save a gigabyte of data. Using multiple files makes it
    > *much* simpler to modify-in-place: you simply replace the modified file.
    > Of course the individual files can be compressed and encrypted, or you can
    > use a compressed/encrypted file system.
    >
    > Lastly, have you considered that your attempted solution is completely the
    > wrong way to solve the problem? If you explain _what_ you are wanting to
    > do, rather than _how_ you want to do it, perhaps there is a better way.


    So, there seems to be a big barrier for that task, when encryption is on
    the whole archive. A complex block navigation within a block cipher
    would be required, and obviously there is no such (handy) code already
    existing. Or is there a encryption/decryption method which you can can
    use like a file pipe _and_ which supports 'seek'?

    Thus, a simple method would use a common treshold timestamp or
    archive-bits and create multiple archive slices. (Instable when the file
    set is dynamic and older files are copied to the tree.)

    2 nearly optimal solutions which allows comparing individual files

    1st:
    + an (s)ftp(s)-to-zip/tar bridge seems to be possible. E.g. by hooking
    ZipFile to use a virtual self.fp
    + the files would be individually encrypted by a password
    - an external tool like "gpg -c" is necessary; (or is there a good
    encryption with a native python module? Is PGP (password only) possible
    with a native python module? )
    - the filenames would be visible

    2nd:
    + manage a dummy file-tree locally for speedy comparision (with 0-length
    files)
    + create encrypted archive slices for upload with iterated filenames
    - an external tool like "gpg -c" is necessary
    - extra file tree or file attribute database
    - unrolling status from multiple archive slices is arduous

    Robert
     
    robert, Mar 11, 2006
    #5
  6. On Sat, 11 Mar 2006 16:09:22 +0100, robert wrote:

    >> Lastly, have you considered that your attempted solution is completely the
    >> wrong way to solve the problem? If you explain _what_ you are wanting to
    >> do, rather than _how_ you want to do it, perhaps there is a better way.

    >
    > So, there seems to be a big barrier for that task, when encryption is on
    > the whole archive. A complex block navigation within a block cipher
    > would be required, and obviously there is no such (handy) code already
    > existing. Or is there a encryption/decryption method which you can can
    > use like a file pipe _and_ which supports 'seek'?


    [snip]

    Let's try again: rather than you telling us what technology you want to
    use, tell us what your aim is. I suspect you are too close to the trees to
    see the forest -- you are focusing on the fine detail. Let's hear the big
    picture: what is the problem you are trying to solve? Because, frankly, as
    far as I can see, the solution you are looking for doesn't exist. But
    maybe I'm too far from the forest to see the individual trees.

    "I need encryption that supports seek" -- no, that's you telling us _how_
    you want to solve your problem.

    Perhaps you can tick some/all of the following requirements:

    - low bandwidth usage when updating the remote site

    - transmission needs to be secure

    - data on the remote site needs to be secure in case of theft or break-ins

    - remote site is under the control of untrusted parties;
    or remote site is trusted

    - remote site is an old machine with limited processing power and very
    small disk storage;
    or remote site can be any machine we choose

    - local site needs to run Windows/Macintosh/Linux/BSD/all of the above

    - remote site runs on Windows/Macintosh/Linux/BSD/anything we like

    - we are updating text files/binary files

    - anything else you can tell us about the nature of your problem



    --
    Steven.
     
    Steven D'Aprano, Mar 11, 2006
    #6
  7. robert

    robert Guest

    Steven D'Aprano wrote:

    > On Sat, 11 Mar 2006 16:09:22 +0100, robert wrote:
    >
    >
    >>>Lastly, have you considered that your attempted solution is completely the
    >>>wrong way to solve the problem? If you explain _what_ you are wanting to
    >>>do, rather than _how_ you want to do it, perhaps there is a better way.

    >>
    >>So, there seems to be a big barrier for that task, when encryption is on
    >>the whole archive. A complex block navigation within a block cipher
    >>would be required, and obviously there is no such (handy) code already
    >>existing. Or is there a encryption/decryption method which you can can
    >>use like a file pipe _and_ which supports 'seek'?

    >
    >
    > [snip]
    >
    > Let's try again: rather than you telling us what technology you want to
    > use, tell us what your aim is. I suspect you are too close to the trees to
    > see the forest -- you are focusing on the fine detail. Let's hear the big
    > picture: what is the problem you are trying to solve? Because, frankly, as
    > far as I can see, the solution you are looking for doesn't exist. But
    > maybe I'm too far from the forest to see the individual trees.
    >
    > "I need encryption that supports seek" -- no, that's you telling us _how_
    > you want to solve your problem.
    >
    > Perhaps you can tick some/all of the following requirements:
    >
    > - low bandwidth usage when updating the remote site
    >
    > - transmission needs to be secure
    >
    > - data on the remote site needs to be secure in case of theft or break-ins
    >
    > - remote site is under the control of untrusted parties;
    > or remote site is trusted
    >
    > - remote site is an old machine with limited processing power and very
    > small disk storage;
    > or remote site can be any machine we choose
    >
    > - local site needs to run Windows/Macintosh/Linux/BSD/all of the above
    >
    > - remote site runs on Windows/Macintosh/Linux/BSD/anything we like
    >
    > - we are updating text files/binary files
    >
    > - anything else you can tell us about the nature of your problem


    The main requirement is, that it has to be become a cohesive, reusable,
    portable (FTP/SFTP standard) functionality as mentioned in the OP. A
    Python module at best. For integration in a bigger Python app. not a
    one-time admin hack with a bunch of tools to be fiddled together on each
    user machine. So the 'how' is mostly =='what'. Its a Python question so far.

    The last 2 methods I mentioned already are maybe a way to a compromise,
    (if integrated one-stream encryption cannot be managed)

    The only issue remaining: A native Python module for pgp-(pwd
    only)-encryption or another kind of good (commonly supported)
    encryption. ZIP2-encryption itself seems to be too weak? (Still so in
    recent ZIP formats? what about the mode of 7zip etc?) But I found no
    python modules for either.

    http://www.amk.ca/python/code/gpg just calls into an external gpg
    installation.

    Can the functionality of "gpg -c" maybe fiddled together with PyCrypto
    easily ? (variable length key/pwd only - no public key stuff required)

    And what about ZIP password-only encryption itself? Are there maybe any
    usable improvents ?

    And: when there are many files encrypted with the same password (both
    PGP and ZIP), will this decrease the strength of encryption?

    Robert
     
    robert, Mar 11, 2006
    #7
  8. robert

    Guest

    Would rsync into a remote encrypted filesystem work for you?
     
    , Mar 13, 2006
    #8
  9. robert

    robert Guest

    wrote:

    > Would rsync into a remote encrypted filesystem work for you?
    >


    the sync (selection) is custom anyway. The remote filesystem is
    general/unknow. FTP(S) / SFTP is the only standard given.
     
    robert, Mar 13, 2006
    #9
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Michael Wehner
    Replies:
    0
    Views:
    1,834
    Michael Wehner
    May 21, 2006
  2. Moosebumps

    Capturing stdout incrementally

    Moosebumps, Apr 3, 2004, in forum: Python
    Replies:
    5
    Views:
    523
    David Bolen
    Apr 7, 2004
  3. =?ISO-8859-1?Q?BJ=F6rn_Lindqvist?=

    Incrementally converting a C app to Python

    =?ISO-8859-1?Q?BJ=F6rn_Lindqvist?=, May 9, 2006, in forum: Python
    Replies:
    2
    Views:
    295
    Nick Craig-Wood
    May 10, 2006
  4. Replies:
    0
    Views:
    363
  5. David Karr
    Replies:
    2
    Views:
    515
    J. Gleixner
    Jun 1, 2012
Loading...

Share This Page