corrupt zip files


K

kerravon

Apologies - this question is only vaguely related to C.

1. It's likely to be a C program causing the corruption.
2. I used a C program to do the statistical analysis.

But I thought you guys might have the required experience.


I have a zip file that appears to have been produced using pkzip for z/
OS.

However, it looks like it has been transmitted using some sort of text
protocol, because the high bit has been stripped from most bytes, and
some other bytes appear to have been translated. e.g. I think x'0a' in
the input file has been mangled to x'b6' on the way. Does anyone know
what software would do a translation like that?

I believe these characters:
0A 1060
0D 0
12 1044
14 0
15 0
1C 0
1E 0
24 1030
7F 0

are being mapped to these:
81 576
9C 361
A7 645
B6 527
BF 284
E0 249
E9 644
F2 718

Except for x'0d' which I think is just being deleted. This belief
comes from counting (see below) occurrences of the various bytes in a
largish (80k) file.

Here is a small file that shows the problem:

000000 504B0304 B6000600 08006455 22405746 PK........dU"@WF

That x'b6' above should be x'0a' I think (that's more normal).
Apparently some protocol doesn't think that x'0a' will make it
through, so translates it in advance.

000010 30447A00 00002802 00000800 00005851 0Dz...(.......XQ
000020 46303130 38313510 310A4330 0C447740 [email protected]
....
0000A0 504B0102 780BB600 06000800 64552240 PK..x.......dU"@
0000B0 57463044 7A000000 28020000 08003401 WF0Dz...(.....4.
0000C0 00000000 01000000 00000000 00005851 ..............XQ
0000D0 46303130 38316500 30016973 79700006 F01081e.0.isyp..

This x'69737970' is really (once high bit is added back) x'E9F3F9F0'
ie 'Z390' ie something that pkzip for z/OS (MVS) inserts. ie I can
easily see that this file originated on the mainframe. And I can
easily see that the high bit has been stripped on most characters.
Interestingly I do see x'E9' in the output file, even though I can
easily see that x'E9' has been stripped above. So real x'E9' are being
stripped, while probably some other character is causing x'E9' to be
produced. Possibly it has gone through two pieces of software to
produce this effect.

0000E0 00014000 00050002 10000600 04022800 [email protected](.
0000F0 06005C00 08000700 05000001 00070006 ..\.............
000100 00004B00 05000740 0040000B 00064462 [email protected]@....Db
000110 52707073 40404040 40404040 40404040 [email protected]@@@@@@@@@@@
000120 40404040 40404040 40400A @@@@@@@@@@.

Does anyone have any idea what protocol (ftp, sftp, winscp, kermit,
connect:direct, http, pgp) would affect data in this manner? I've
never seen mangling like that before.

Thanks. Paul.



statistical analysis on a large zip file similarly mangled:

00 594
01 698
02 740
03 488
04 749
05 697
06 536
07 526
08 545
09 854
0A 1060
0B 597
0C 641
0D 0
0E 608
0F 639
10 817
11 679
12 1044
13 641
14 0
15 0
16 621
17 533
18 554
19 611
1A 517
1B 555
1C 0
1D 612
1E 0
1F 639
20 731
21 592
22 607
23 549
24 1030
25 546
26 565
27 618
28 490
29 602
2A 436
2B 684
2C 629
2D 706
2E 589
2F 667
30 547
31 815
32 670
33 530
34 569
35 598
36 570
37 619
38 723
39 572
3A 508
3B 626
3C 676
3D 626
3E 615
3F 621
40 687
41 598
42 636
43 557
44 752
45 579
46 645
47 813
48 849
49 837
4A 617
4B 563
4C 587
4D 532
4E 618
4F 705
50 538
51 541
52 553
53 573
54 467
55 416
56 653
57 681
58 727
59 599
5A 560
5B 344
5C 611
5D 293
5E 663
5F 552
60 578
61 562
62 608
63 814
64 649
65 711
66 515
67 660
68 484
69 506
6A 528
6B 627
6C 773
6D 646
6E 627
6F 599
70 602
71 657
72 636
73 620
74 521
75 516
76 732
77 631
78 596
79 715
7A 551
7B 718
7C 621
7D 606
7E 630
7F 0
80 0
81 576
82 0
83 0
84 0
85 0
86 0
87 0
88 0
89 0
8A 0
8B 0
8C 0
8D 0
8E 0
8F 0
90 0
91 0
92 0
93 0
94 0
95 0
96 0
97 0
98 0
99 0
9A 0
9B 0
9C 361
9D 0
9E 0
9F 0
A0 0
A1 0
A2 0
A3 0
A4 0
A5 0
A6 0
A7 645
A8 0
A9 0
AA 0
AB 0
AC 0
AD 0
AE 0
AF 0
B0 0
B1 0
B2 0
B3 0
B4 0
B5 0
B6 527
B7 0
B8 0
B9 0
BA 0
BB 0
BC 0
BD 0
BE 0
BF 284
C0 0
C1 0
C2 0
C3 0
C4 0
C5 0
C6 0
C7 0
C8 0
C9 0
CA 0
CB 0
CC 0
CD 0
CE 0
CF 0
D0 0
D1 0
D2 0
D3 0
D4 0
D5 0
D6 0
D7 0
D8 0
D9 0
DA 0
DB 0
DC 0
DD 0
DE 0
DF 0
E0 249
E1 0
E2 0
E3 0
E4 0
E5 0
E6 0
E7 0
E8 0
E9 644
EA 0
EB 0
EC 0
ED 0
EE 0
EF 0
F0 0
F1 0
F2 718
F3 0
F4 0
F5 0
F6 0
F7 0
F8 0
F9 0
FA 0
FB 0
FC 0
FD 0
FE 0
FF 0
 
Ad

Advertisements

J

Jens Thoms Toerring

kerravon said:
Apologies - this question is only vaguely related to C.
1. It's likely to be a C program causing the corruption.

That doesn't make it on-topic here - clc is for discussions
about the C language and not about the behaviour of the mil-
lions of programs written in C.
2. I used a C program to do the statistical analysis.

There's nothing in your post about that program...
But I thought you guys might have the required experience.
I have a zip file that appears to have been produced using pkzip for z/
OS.
However, it looks like it has been transmitted using some sort of text
protocol, because the high bit has been stripped from most bytes, and
some other bytes appear to have been translated. e.g. I think x'0a' in
the input file has been mangled to x'b6' on the way. Does anyone know
what software would do a translation like that?
I believe these characters:
are being mapped to these:
Except for x'0d' which I think is just being deleted. This belief
comes from counting (see below) occurrences of the various bytes in a
largish (80k) file.

If that is the case there is e.g. ftp that drops carriage re-
turn characters (0x0f) along the way when transferring in ASCII
mode and the system you transfer the file to has the convention
that lines end with just a new line character (0x0a).
Here is a small file that shows the problem:
000000 504B0304 B6000600 08006455 22405746 PK........dU"@WF
That x'b6' above should be x'0a' I think (that's more normal).

I have no idea how all of this should help you. It is unlikely
that you can repair the file - about as probable as getting
back the cow back after you have processed it through a meat-
mincer. There's no way to figure out where '0x0d' was removed
or which bytes had their upper bit stripped - even if you would
know which tools were used. So simply go back and transfer the
file again, taking care to use the right tools (or use them
with the correct options to avoid getting it garbled).

Regards, Jens
 
K

kerravon

I have no idea how all of this should help you. It is unlikely
that you can repair the file - about as probable as getting
back the cow back after you have processed it through a meat-
mincer. There's no way to figure out where '0x0d' was removed
or which bytes had their upper bit stripped - even if you would
know which tools were used. So simply go back and transfer the
file again, taking care to use the right tools (or use them
with the correct options to avoid getting it garbled).

It is for my professional knowledge. I like to be able to look
at a corrupted file and be able to immediately diagnose it as
"you have used connect:direct/pgp/whatever to transfer in
text mode, please resend in binary mode". Currently I have to
say "I have no idea what you have done wrong, but some leg of
the journey has corruption that I have never seen before in
my decades-long career". I don't like that. Just a personal thing.
I was hoping those symptoms would be something that someone
else has seen before in their decades-long career and can
name the software that has this "distinctive" behaviour.

BFN. Paul.
 
M

Malcolm McLean

בת×ריך ×™×•× ×¨×שון, 6 במ××™ 2012 12:16:41 UTC+1, מ×ת kerravon:
It is for my professional knowledge. I like to be able to look
at a corrupted file and be able to immediately diagnose it as
"you have used connect:direct/pgp/whatever to transfer in
text mode, please resend in binary mode".
It's obviously been corrupted twice, first by being uuencoded (that's binary converted to ASCII nonsense character by treating the binary as a sequence of 7-bit bytes, as I'm sure you know) and then a second process has inserted some non-ascii characters.
The first thing I would do is some second order modelling on those characters, to try to see if they occur at random of before or after any particularbyte.

Then if you don't get any joy, do what you've done, dry-running it through pkunzip to try to work out what the bytes should be.
 
J

Jens Thoms Toerring

It is for my professional knowledge. I like to be able to look
at a corrupted file and be able to immediately diagnose it as
"you have used connect:direct/pgp/whatever to transfer in
text mode, please resend in binary mode". Currently I have to
say "I have no idea what you have done wrong, but some leg of
the journey has corruption that I have never seen before in
my decades-long career". I don't like that. Just a personal thing.
I was hoping those symptoms would be something that someone
else has seen before in their decades-long career and can
name the software that has this "distinctive" behaviour.

Ah, I had assumed there was a C question hidden somewhere
in this. Have you considered asking in comp.programming?

Regards, Jens
 
J

Jorgen Grahn

Ah, I had assumed there was a C question hidden somewhere
in this. Have you considered asking in comp.programming?

I think his chances may be better in some mainframe-related group.

Or maybe alt.folklore.computers, even though they don't normally solve
problems there. Perhaps they'd enjoy this one.
It's an interesting exercise.

/Jorgen
 
Ad

Advertisements

K

kerravon

בת×ריך ×™×•× ×¨×שון, 6 במ××™ 2012 12:16:41 UTC+1, מ×ת kerravon:


It's obviously been corrupted twice, first by being uuencoded (that's binary converted to ASCII nonsense character by treating the binary as a sequence of 7-bit bytes, as I'm sure you know)

I wasn't aware that uuencode treated things as 7-bit bytes. I tried it
(see below) and it appears 8-bit clean to me.
and then a second process has inserted some non-ascii characters.

Someone has a theory that it is code page conversion.
The first thing I would do is some second order modelling on those characters, to try to see if they occur at random of before or after any particular byte.

Ok, I'll try looking harder.
Then if you don't get any joy, do what you've done, dry-running it through pkunzip to try to work out what the bytes should be.

Basically every second byte on average will be corrupt (high bit
stripped). I can't go on doing that.

BFN. Paul.



C:\scratch\uu>hexdump temp2.txt
000000 F1F2F3 ...

C:\scratch\uu>\cygwin\bin\uuencode temp2.txt temp2.txt >temp3.txt

C:\scratch\uu>\cygwin\bin\uudecode temp3.txt -o temp4.txt

C:\scratch\uu>hexdump temp4.txt
000000 F1F2F3 ...
 
N

Nobody

I wasn't aware that uuencode treated things as 7-bit bytes. I tried it
(see below) and it appears 8-bit clean to me.

I think he meant 6-bit bytes. uuencoding concatenates 3 8-bit bytes
into a 24-bit value which it then splits into 4 6-bit values. Each of
those values has 32 added to it, shifting the range from 0-63 to 32-95,
all of which should be passed verbatim by most text-oriented communication
channels.
 
K

kerravon

Just from the environment involved, I'm going to guess that in the
transfer from mainframe to PC, it underwent EBCDIC to ASCII (or
vice versa) translation, plus FTP in text mode can fiddle with line
endings.  This involves far more than just stripping the high bit
off.  Zip files aren't really in EBCDIC or ASCII anyway (the file
content part) - most of it is compressed data that is 8-bit raw
data.

If it had gone through a single EBCDIC to ASCII
conversion, then I wouldn't be seeing the ASCII "PK"
at the start, and I wouldn't be seeing the EBCDIC
"Z390" (minus stripped high bit).

It is possible that it went through an erroneous
EBCDIC to ASCII then ASCII to EBCDIC, or even
ASCII to EBCDIC followed by EBCDIC to ASCII, to
produce the effect.
For more information, compare a hex dump of the file on the mainframe
vs. the file on the PC.

That is easier said than done, as it requires
other people to be cooperative on what they
consider to be none of my business. :)

BFN. Paul.
 
Joined
May 14, 2012
Messages
1
Reaction score
0
Hi....
There isa nothiong to worry about your corrupted zip and zipx files. You can easily recover them by using Repair zipx Software. This software helps you to repair Zipx file after header corruption. The software also makes it possible to fix Zipx archive corrupt due to incomplete downloading also.To know mote about the software use this link.
http://www.repairzipx.com/archive.html
Also you can download the demo version of software here.....:lol::lol::lol:
 
Ad

Advertisements

Joined
Jun 22, 2012
Messages
1
Reaction score
0
Hi...
You can easily recover your Zip files which are corrupted. This can be done by using Zipx Repair Software. This software is compatible with Microsoft Windows 7, Windows Vista, Windows XP, Windows 2003 and Windows 2008. This is the best zipx archive repair tool.
You can download the free trial version of software here....
 
Ad

Advertisements


Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top