corrupt zip files

Discussion in 'C Programming' started by kerravon, May 6, 2012.

  1. kerravon

    kerravon Guest

    Apologies - this question is only vaguely related to C.

    1. It's likely to be a C program causing the corruption.
    2. I used a C program to do the statistical analysis.

    But I thought you guys might have the required experience.


    I have a zip file that appears to have been produced using pkzip for z/
    OS.

    However, it looks like it has been transmitted using some sort of text
    protocol, because the high bit has been stripped from most bytes, and
    some other bytes appear to have been translated. e.g. I think x'0a' in
    the input file has been mangled to x'b6' on the way. Does anyone know
    what software would do a translation like that?

    I believe these characters:

    > 0A 1060
    > 0D 0
    > 12 1044
    > 14 0
    > 15 0
    > 1C 0
    > 1E 0
    > 24 1030
    > 7F 0


    are being mapped to these:

    > 81 576
    > 9C 361
    > A7 645
    > B6 527
    > BF 284
    > E0 249
    > E9 644
    > F2 718


    Except for x'0d' which I think is just being deleted. This belief
    comes from counting (see below) occurrences of the various bytes in a
    largish (80k) file.

    Here is a small file that shows the problem:

    000000 504B0304 B6000600 08006455 22405746 PK........dU"@WF

    That x'b6' above should be x'0a' I think (that's more normal).
    Apparently some protocol doesn't think that x'0a' will make it
    through, so translates it in advance.

    000010 30447A00 00002802 00000800 00005851 0Dz...(.......XQ
    000020 46303130 38313510 310A4330 0C447740 F010815.1.C0.Dw@
    ....
    0000A0 504B0102 780BB600 06000800 64552240 PK..x.......dU"@
    0000B0 57463044 7A000000 28020000 08003401 WF0Dz...(.....4.
    0000C0 00000000 01000000 00000000 00005851 ..............XQ
    0000D0 46303130 38316500 30016973 79700006 F01081e.0.isyp..

    This x'69737970' is really (once high bit is added back) x'E9F3F9F0'
    ie 'Z390' ie something that pkzip for z/OS (MVS) inserts. ie I can
    easily see that this file originated on the mainframe. And I can
    easily see that the high bit has been stripped on most characters.
    Interestingly I do see x'E9' in the output file, even though I can
    easily see that x'E9' has been stripped above. So real x'E9' are being
    stripped, while probably some other character is causing x'E9' to be
    produced. Possibly it has gone through two pieces of software to
    produce this effect.

    0000E0 00014000 00050002 10000600 04022800 ..@...........(.
    0000F0 06005C00 08000700 05000001 00070006 ..\.............
    000100 00004B00 05000740 0040000B 00064462 ..K....@.@....Db
    000110 52707073 40404040 40404040 40404040 Rpps@@@@@@@@@@@@
    000120 40404040 40404040 40400A @@@@@@@@@@.

    Does anyone have any idea what protocol (ftp, sftp, winscp, kermit,
    connect:direct, http, pgp) would affect data in this manner? I've
    never seen mangling like that before.

    Thanks. Paul.



    statistical analysis on a large zip file similarly mangled:

    00 594
    01 698
    02 740
    03 488
    04 749
    05 697
    06 536
    07 526
    08 545
    09 854
    0A 1060
    0B 597
    0C 641
    0D 0
    0E 608
    0F 639
    10 817
    11 679
    12 1044
    13 641
    14 0
    15 0
    16 621
    17 533
    18 554
    19 611
    1A 517
    1B 555
    1C 0
    1D 612
    1E 0
    1F 639
    20 731
    21 592
    22 607
    23 549
    24 1030
    25 546
    26 565
    27 618
    28 490
    29 602
    2A 436
    2B 684
    2C 629
    2D 706
    2E 589
    2F 667
    30 547
    31 815
    32 670
    33 530
    34 569
    35 598
    36 570
    37 619
    38 723
    39 572
    3A 508
    3B 626
    3C 676
    3D 626
    3E 615
    3F 621
    40 687
    41 598
    42 636
    43 557
    44 752
    45 579
    46 645
    47 813
    48 849
    49 837
    4A 617
    4B 563
    4C 587
    4D 532
    4E 618
    4F 705
    50 538
    51 541
    52 553
    53 573
    54 467
    55 416
    56 653
    57 681
    58 727
    59 599
    5A 560
    5B 344
    5C 611
    5D 293
    5E 663
    5F 552
    60 578
    61 562
    62 608
    63 814
    64 649
    65 711
    66 515
    67 660
    68 484
    69 506
    6A 528
    6B 627
    6C 773
    6D 646
    6E 627
    6F 599
    70 602
    71 657
    72 636
    73 620
    74 521
    75 516
    76 732
    77 631
    78 596
    79 715
    7A 551
    7B 718
    7C 621
    7D 606
    7E 630
    7F 0
    80 0
    81 576
    82 0
    83 0
    84 0
    85 0
    86 0
    87 0
    88 0
    89 0
    8A 0
    8B 0
    8C 0
    8D 0
    8E 0
    8F 0
    90 0
    91 0
    92 0
    93 0
    94 0
    95 0
    96 0
    97 0
    98 0
    99 0
    9A 0
    9B 0
    9C 361
    9D 0
    9E 0
    9F 0
    A0 0
    A1 0
    A2 0
    A3 0
    A4 0
    A5 0
    A6 0
    A7 645
    A8 0
    A9 0
    AA 0
    AB 0
    AC 0
    AD 0
    AE 0
    AF 0
    B0 0
    B1 0
    B2 0
    B3 0
    B4 0
    B5 0
    B6 527
    B7 0
    B8 0
    B9 0
    BA 0
    BB 0
    BC 0
    BD 0
    BE 0
    BF 284
    C0 0
    C1 0
    C2 0
    C3 0
    C4 0
    C5 0
    C6 0
    C7 0
    C8 0
    C9 0
    CA 0
    CB 0
    CC 0
    CD 0
    CE 0
    CF 0
    D0 0
    D1 0
    D2 0
    D3 0
    D4 0
    D5 0
    D6 0
    D7 0
    D8 0
    D9 0
    DA 0
    DB 0
    DC 0
    DD 0
    DE 0
    DF 0
    E0 249
    E1 0
    E2 0
    E3 0
    E4 0
    E5 0
    E6 0
    E7 0
    E8 0
    E9 644
    EA 0
    EB 0
    EC 0
    ED 0
    EE 0
    EF 0
    F0 0
    F1 0
    F2 718
    F3 0
    F4 0
    F5 0
    F6 0
    F7 0
    F8 0
    F9 0
    FA 0
    FB 0
    FC 0
    FD 0
    FE 0
    FF 0
     
    kerravon, May 6, 2012
    #1
    1. Advertising

  2. kerravon <> wrote:
    > Apologies - this question is only vaguely related to C.


    > 1. It's likely to be a C program causing the corruption.


    That doesn't make it on-topic here - clc is for discussions
    about the C language and not about the behaviour of the mil-
    lions of programs written in C.

    > 2. I used a C program to do the statistical analysis.


    There's nothing in your post about that program...

    > But I thought you guys might have the required experience.


    > I have a zip file that appears to have been produced using pkzip for z/
    > OS.


    > However, it looks like it has been transmitted using some sort of text
    > protocol, because the high bit has been stripped from most bytes, and
    > some other bytes appear to have been translated. e.g. I think x'0a' in
    > the input file has been mangled to x'b6' on the way. Does anyone know
    > what software would do a translation like that?


    > I believe these characters:


    > > 0A 1060
    > > 0D 0
    > > 12 1044
    > > 14 0
    > > 15 0
    > > 1C 0
    > > 1E 0
    > > 24 1030
    > > 7F 0


    > are being mapped to these:


    > > 81 576
    > > 9C 361
    > > A7 645
    > > B6 527
    > > BF 284
    > > E0 249
    > > E9 644
    > > F2 718


    > Except for x'0d' which I think is just being deleted. This belief
    > comes from counting (see below) occurrences of the various bytes in a
    > largish (80k) file.


    If that is the case there is e.g. ftp that drops carriage re-
    turn characters (0x0f) along the way when transferring in ASCII
    mode and the system you transfer the file to has the convention
    that lines end with just a new line character (0x0a).

    > Here is a small file that shows the problem:


    > 000000 504B0304 B6000600 08006455 22405746 PK........dU"@WF


    > That x'b6' above should be x'0a' I think (that's more normal).


    I have no idea how all of this should help you. It is unlikely
    that you can repair the file - about as probable as getting
    back the cow back after you have processed it through a meat-
    mincer. There's no way to figure out where '0x0d' was removed
    or which bytes had their upper bit stripped - even if you would
    know which tools were used. So simply go back and transfer the
    file again, taking care to use the right tools (or use them
    with the correct options to avoid getting it garbled).

    Regards, Jens
    --
    \ Jens Thoms Toerring ___
    \__________________________ http://toerring.de
     
    Jens Thoms Toerring, May 6, 2012
    #2
    1. Advertising

  3. kerravon

    kerravon Guest

    On May 6, 8:09 pm, (Jens Thoms Toerring) wrote:
    > > Here is a small file that shows the problem:
    > > 000000 504B0304 B6000600 08006455 22405746 PK........dU"@WF
    > > That x'b6' above should be x'0a' I think (that's more normal).

    >
    > I have no idea how all of this should help you. It is unlikely
    > that you can repair the file - about as probable as getting
    > back the cow back after you have processed it through a meat-
    > mincer. There's no way to figure out where '0x0d' was removed
    > or which bytes had their upper bit stripped - even if you would
    > know which tools were used. So simply go back and transfer the
    > file again, taking care to use the right tools (or use them
    > with the correct options to avoid getting it garbled).


    It is for my professional knowledge. I like to be able to look
    at a corrupted file and be able to immediately diagnose it as
    "you have used connect:direct/pgp/whatever to transfer in
    text mode, please resend in binary mode". Currently I have to
    say "I have no idea what you have done wrong, but some leg of
    the journey has corruption that I have never seen before in
    my decades-long career". I don't like that. Just a personal thing.
    I was hoping those symptoms would be something that someone
    else has seen before in their decades-long career and can
    name the software that has this "distinctive" behaviour.

    BFN. Paul.
     
    kerravon, May 6, 2012
    #3
  4. בת×ריך ×™×•× ×¨×שון, 6 במ××™ 2012 12:16:41 UTC+1, מ×ת kerravon:
    >
    > It is for my professional knowledge. I like to be able to look
    > at a corrupted file and be able to immediately diagnose it as
    > "you have used connect:direct/pgp/whatever to transfer in
    > text mode, please resend in binary mode".
    >

    It's obviously been corrupted twice, first by being uuencoded (that's binary converted to ASCII nonsense character by treating the binary as a sequence of 7-bit bytes, as I'm sure you know) and then a second process has inserted some non-ascii characters.
    The first thing I would do is some second order modelling on those characters, to try to see if they occur at random of before or after any particularbyte.

    Then if you don't get any joy, do what you've done, dry-running it through pkunzip to try to work out what the bytes should be.
     
    Malcolm McLean, May 6, 2012
    #4
  5. kerravon <> wrote:
    > On May 6, 8:09 pm, (Jens Thoms Toerring) wrote:
    > > > Here is a small file that shows the problem:
    > > > 000000 504B0304 B6000600 08006455 22405746 PK........dU"@WF
    > > > That x'b6' above should be x'0a' I think (that's more normal).


    > > I have no idea how all of this should help you. It is unlikely
    > > that you can repair the file - about as probable as getting
    > > back the cow back after you have processed it through a meat-
    > > mincer. There's no way to figure out where '0x0d' was removed
    > > or which bytes had their upper bit stripped - even if you would
    > > know which tools were used. So simply go back and transfer the
    > > file again, taking care to use the right tools (or use them
    > > with the correct options to avoid getting it garbled).


    > It is for my professional knowledge. I like to be able to look
    > at a corrupted file and be able to immediately diagnose it as
    > "you have used connect:direct/pgp/whatever to transfer in
    > text mode, please resend in binary mode". Currently I have to
    > say "I have no idea what you have done wrong, but some leg of
    > the journey has corruption that I have never seen before in
    > my decades-long career". I don't like that. Just a personal thing.
    > I was hoping those symptoms would be something that someone
    > else has seen before in their decades-long career and can
    > name the software that has this "distinctive" behaviour.


    Ah, I had assumed there was a C question hidden somewhere
    in this. Have you considered asking in comp.programming?

    Regards, Jens
    --
    \ Jens Thoms Toerring ___
    \__________________________ http://toerring.de
     
    Jens Thoms Toerring, May 6, 2012
    #5
  6. kerravon

    Jorgen Grahn Guest

    On Sun, 2012-05-06, Jens Thoms Toerring wrote:
    > kerravon <> wrote:
    >> On May 6, 8:09 pm, (Jens Thoms Toerring) wrote:
    >> > > Here is a small file that shows the problem:
    >> > > 000000 504B0304 B6000600 08006455 22405746 PK........dU"@WF
    >> > > That x'b6' above should be x'0a' I think (that's more normal).

    >
    >> > I have no idea how all of this should help you. It is unlikely
    >> > that you can repair the file - about as probable as getting
    >> > back the cow back after you have processed it through a meat-
    >> > mincer. There's no way to figure out where '0x0d' was removed
    >> > or which bytes had their upper bit stripped - even if you would
    >> > know which tools were used. So simply go back and transfer the
    >> > file again, taking care to use the right tools (or use them
    >> > with the correct options to avoid getting it garbled).

    >
    >> It is for my professional knowledge. I like to be able to look
    >> at a corrupted file and be able to immediately diagnose it as
    >> "you have used connect:direct/pgp/whatever to transfer in
    >> text mode, please resend in binary mode". Currently I have to
    >> say "I have no idea what you have done wrong, but some leg of
    >> the journey has corruption that I have never seen before in
    >> my decades-long career". I don't like that. Just a personal thing.
    >> I was hoping those symptoms would be something that someone
    >> else has seen before in their decades-long career and can
    >> name the software that has this "distinctive" behaviour.

    >
    > Ah, I had assumed there was a C question hidden somewhere
    > in this. Have you considered asking in comp.programming?


    I think his chances may be better in some mainframe-related group.

    Or maybe alt.folklore.computers, even though they don't normally solve
    problems there. Perhaps they'd enjoy this one.
    It's an interesting exercise.

    /Jorgen

    --
    // Jorgen Grahn <grahn@ Oo o. . .
    \X/ snipabacken.se> O o .
     
    Jorgen Grahn, May 7, 2012
    #6
  7. kerravon

    kerravon Guest

    On May 7, 7:19 am, Malcolm McLean <>
    wrote:
    > בת×ריך ×™×•× ×¨×שון, 6 במ××™ 2012 12:16:41 UTC+1, מ×ת kerravon:
    >
    > > It is for my professional knowledge. I like to be able to look
    > > at a corrupted file and be able to immediately diagnose it as
    > > "you have used connect:direct/pgp/whatever to transfer in
    > > text mode, please resend in binary mode".

    >
    > It's obviously been corrupted twice, first by being uuencoded (that's binary converted to ASCII nonsense character by treating the binary as a sequence of 7-bit bytes, as I'm sure you know)


    I wasn't aware that uuencode treated things as 7-bit bytes. I tried it
    (see below) and it appears 8-bit clean to me.

    > and then a second process has inserted some non-ascii characters.


    Someone has a theory that it is code page conversion.

    > The first thing I would do is some second order modelling on those characters, to try to see if they occur at random of before or after any particular byte.


    Ok, I'll try looking harder.

    > Then if you don't get any joy, do what you've done, dry-running it through pkunzip to try to work out what the bytes should be.


    Basically every second byte on average will be corrupt (high bit
    stripped). I can't go on doing that.

    BFN. Paul.



    C:\scratch\uu>hexdump temp2.txt
    000000 F1F2F3 ...

    C:\scratch\uu>\cygwin\bin\uuencode temp2.txt temp2.txt >temp3.txt

    C:\scratch\uu>\cygwin\bin\uudecode temp3.txt -o temp4.txt

    C:\scratch\uu>hexdump temp4.txt
    000000 F1F2F3 ...
     
    kerravon, May 7, 2012
    #7
  8. kerravon

    Nobody Guest

    On Mon, 07 May 2012 08:24:36 -0700, kerravon wrote:

    >> It's obviously been corrupted twice, first by being uuencoded (that's
    >> binary converted to ASCII nonsense character by treating the binary as a
    >> sequence of 7-bit bytes, as I'm sure you know)

    >
    > I wasn't aware that uuencode treated things as 7-bit bytes. I tried it
    > (see below) and it appears 8-bit clean to me.


    I think he meant 6-bit bytes. uuencoding concatenates 3 8-bit bytes
    into a 24-bit value which it then splits into 4 6-bit values. Each of
    those values has 32 added to it, shifting the range from 0-63 to 32-95,
    all of which should be passed verbatim by most text-oriented communication
    channels.
     
    Nobody, May 8, 2012
    #8
  9. kerravon

    kerravon Guest

    On May 8, 5:28 pm, (Gordon Burditt) wrote:
    > > I have a zip file that appears to have been produced using pkzip for z/
    > > OS.

    >
    > > However, it looks like it has been transmitted using some sort of text
    > > protocol, because the high bit has been stripped from most bytes, and
    > > some other bytes appear to have been translated. e.g. I think x'0a' in
    > > the input file has been mangled to x'b6' on the way. Does anyone know
    > > what software would do a translation like that?

    >
    > Just from the environment involved, I'm going to guess that in the
    > transfer from mainframe to PC, it underwent EBCDIC to ASCII (or
    > vice versa) translation, plus FTP in text mode can fiddle with line
    > endings.  This involves far more than just stripping the high bit
    > off.  Zip files aren't really in EBCDIC or ASCII anyway (the file
    > content part) - most of it is compressed data that is 8-bit raw
    > data.


    If it had gone through a single EBCDIC to ASCII
    conversion, then I wouldn't be seeing the ASCII "PK"
    at the start, and I wouldn't be seeing the EBCDIC
    "Z390" (minus stripped high bit).

    It is possible that it went through an erroneous
    EBCDIC to ASCII then ASCII to EBCDIC, or even
    ASCII to EBCDIC followed by EBCDIC to ASCII, to
    produce the effect.

    > For more information, compare a hex dump of the file on the mainframe
    > vs. the file on the PC.


    That is easier said than done, as it requires
    other people to be cooperative on what they
    consider to be none of my business. :)

    BFN. Paul.
     
    kerravon, May 8, 2012
    #9
  10. kerravon

    lena

    Joined:
    May 14, 2012
    Messages:
    1
    Hi....
    There isa nothiong to worry about your corrupted zip and zipx files. You can easily recover them by using Repair zipx Software. This software helps you to repair Zipx file after header corruption. The software also makes it possible to fix Zipx archive corrupt due to incomplete downloading also.To know mote about the software use this link.
    http://www.repairzipx.com/archive.html
    Also you can download the demo version of software here.....:lol::lol::lol:
     
    lena, May 14, 2012
    #10
  11. kerravon

    timss

    Joined:
    Jun 22, 2012
    Messages:
    1
    Hi...
    You can easily recover your Zip files which are corrupted. This can be done by using Zipx Repair Software. This software is compatible with Microsoft Windows 7, Windows Vista, Windows XP, Windows 2003 and Windows 2008. This is the best zipx archive repair tool.
    You can download the free trial version of software here....
     
    timss, Jun 22, 2012
    #11
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Alex Hunsley
    Replies:
    1
    Views:
    603
    Andrew Thompson
    Sep 16, 2004
  2. Replies:
    1
    Views:
    812
  3. yidan
    Replies:
    0
    Views:
    1,271
    yidan
    Mar 31, 2008
  4. Joop

    corrupt zip - files?

    Joop, Jul 27, 2003, in forum: ASP General
    Replies:
    0
    Views:
    113
  5. MoshiachNow
    Replies:
    2
    Views:
    276
    Ilya Zakharevich
    Oct 4, 2006
Loading...

Share This Page