Reading compressed file and its conversion to numbers.

H

Henry Lenzi

Hi All --

I have some questions related to text compression and fonts.
If I use a shell, like this

$ gzip < f1.txt > TEST1

and the read the file

$ cat TEST1

It apears like this (ANSI extended charset -hope you can see this):

;Â0D??åð?(?HV,ëàQ*DAqÿSôæÍ\±Ì1|"ßs
AT=yy?¸Ô|Á;®e©¨ÂèØ®°\ æÝTÏ*¹µ¨¯fÊÑhFÖŻܡR$ÚºïÛY½=ãå9ñöa»Û¯«+R %(Óc NEzýmUl ~{EöÎýñ§É$

Great! I wanted that! But if I try to use zip with perl:

#!/usr/bin/perl
use warnings;
use strict;


my $source = shift @ARGV;
#my $destination = shift @ARGV;

open IN, $source;
#open OUT, "|gzip > $destination";
open OUT, "|gzip > ./TEST";
close IN;
close OUT;

then I can't cat the file. Nothing appears, because it's compressed.
I can do it with the shell, but can't do with perl.
Why is that? What must I do?

2) Regarding the compressed string:

;Â0D??åð?(?HV,ëàQ*DAqÿSôæÍ\±Ì1|"ßs
AT=yy?¸Ô|Á;®e©¨ÂèØ®°\ æÝTÏ*¹µ¨¯fÊÑhFÖŻܡR$ÚºïÛY½=ãå9ñöa»Û¯«+R %(Óc NEzýmUl ~{EöÎýñ§É$

Is there a way to read it and obtain the ANSI extended charset hexadecimal _numbers_?
For instance, the above line would display:

3B C2 4F etc...

I have attempted this:

#!/usr/bin/perl
use warnings;
use strict;
use utf8;

my $source = shift @ARGV;
my $destination = shift @ARGV;

open (IN, "< $source");

my @array = unpack("C*", $source);

print "@array,\n";

close IN;

but the result is that it won't read the _content_ of the file.

$ perl prog.pl TEST1
84 69 83 84 49,

What is the issue here?

Any help is greatly appreciated.

Henry
 
J

jl_post

Henry said:
But if I try to use zip with perl:

#!/usr/bin/perl
use warnings;
use strict;


my $source = shift @ARGV;
#my $destination = shift @ARGV;

open IN, $source;
#open OUT, "|gzip > $destination";
open OUT, "|gzip > ./TEST";
close IN;
close OUT;


Is this your whole program? If so, you're not sending anything to the
OUT filehandle. Therefore, I'd be surprised if anything at all gets
written to ./TEST .

Try adding the line "print OUT <IN>;" before your close() statements
and see if that fixes anything.

Another tip: Try adding " or die $!;" after your open() statements in
case they are the reason your program isn't doing what you'd expect.
 
J

jl_post

Henry said:
Is there a way to read it and obtain the ANSI extended
charset hexadecimal _numbers_?
For instance, the above line would display:

3B C2 4F etc...

I have attempted this:

#!/usr/bin/perl
use warnings;
use strict;
use utf8;

my $source = shift @ARGV;
my $destination = shift @ARGV;

open (IN, "< $source");

my @array = unpack("C*", $source);

print "@array,\n";

close IN;

but the result is that it won't read the _content_ of the file.

$ perl prog.pl TEST1
84 69 83 84 49,

What is the issue here?


Dear Henry,

Your unpack statement unpacks the text in the variable $source,
which was taken from @ARGV array (so in your case, $source contains
"TEST1"). Therefore, the ASCII values for 'T', 'E', 'S', 'T', and '1'
were printed.

Try changing the following two lines from:

code> my @array = unpack("C*", $source);
code> print "@array,\n";

to:

code> my $fileContents = join('', <IN>);
code> my $hex = unpack("H*", $fileContents);
code> print "$hex\n";

and see if that fixes things.

Happy Perling!

-- Jean-Luc
 
J

jl_post

Henry said:
Some considerations:

1)
What did the trick was passing the array reference to unpack, in:

my $data_r = \@data;

my $hex = unpack("H*", $data_r);

From what I read in the Camel book, there wasn't a way to do it
_without_ passing a reference, because all has to be put into a
string. Right?

No -- you don't want to pass in a reference! You're right in saying
that it all has to be put in a string -- so you should join the array
into one long string with a line like:

my $dataString = join('', @data);

Then you can get the hex value with:

my $hex = unpack("H*", $dataString);

If you pass a reference in (which is what you did), the reference
will get converted to a string -- but not the one you want.

Try this code:

print 'String value of \@data: "', \@data, "\"\n";

What you'll see as output is something like this:

String value of \@data: "ARRAY(0x1a4deec)"

So when you pass in an array reference to unpack(), you won't be
converting its contents to hex, but rather a string like
"ARRAY(0x1a4deec)" instead.

2)
The output is:

41525241592830783831393130373829

What I don't understand is why there aren't any numbers in hex
format.

I think you mean that no hexadecimal digits from A through F appear.
That's because when converted to hex form, the characters
"ARRAY(0x1a4deec)" don't happen to use the digits A through F (I
thought that was strange, so I tried:

print unpack "H*", "ARRAY(0x1a4deec)";

just to prove to myself that only the hex digits 0-9 are used.)
Is it the array ref number that was transformed, instead
of the set of characters?

You got it! :)
In that case, how would I get my @data passed to unpack? (Please
don't tell me there isn't a way without iterating line-by-line).

(Fortunately, you don't have to iterate line-by-line.) As you
probably already know, all you have to do is pass a join()ed string to
unpack(), like this:

my $hex = unpack("H*", join('', @data));
Any ideas/comments/help is appreciated.

Just one more thing:

Using "H*" (and "B*") with the unpack() command confused me quite a
bit until I realized that the code 'unpack("H*",$string)' takes ONE
scalar string and returns ONE scalar string. Before realizing that I
tried passing a string expecting to return a list. In your case, you
were trying to pass in an array, expecting an array.

We both had our misconceptions, so just remember that unpacking with
"H*" (and "B*") takes one scalar and returns one scalar, and you should
do just fine.

I hope this helps, Henry.

-- Jean-Luc
 
H

Henry Lenzi

Dear Henry,


Hi --

Thanks for your help. I was able to improve it - but there's a bug, I
think - with the following:

###################################
#!/usr/bin/perl
use warnings;
use strict;

my $source = shift @ARGV;
my $destination = shift @ARGV;
open IN, $source;
#open OUT, "|gzip > $destination";
open OUT, "|gzip > ./OUTPUT";

print OUT <IN>;

open OUTPUTFILE, "< ./OUTPUT";

my @data;
@data = <OUTPUTFILE>;

print "@data\n";

my $data_r = \@data;

my $hex = unpack("H*", $data_r);

print "$hex\n";

close IN;
close OUT;
close OUTPUTFILE;
##################################

Some considerations:

1)
What did the trick was passing the array reference to unpack, in:

my $data_r = \@data;

my $hex = unpack("H*", $data_r);

From what I read in the Camel book, there wasn't a way to do it
_without_ passing a reference, because all has to be put into a
string. Right?

2)
The output is:

comp@ttyp2[play]$ perl p1.pl FILE1 OUTPUT2
ÝtB
ñ
ñqµUPòM,.I-ÊÌKWpÌIÏ/Ê,ÉÈ-R
©E9J
¡!þA@^ùy
þEåy%J:mad:^bQvv¾GfjIb^fjn&X0?#Ï7Ù%?/1'EIÁÕÅ3¬Ó_=(53'§R
:%mru
41525241592830783831393130373829

What I don't understand is why there aren't any numbers in hex
format. For instance, Ì is hex CC, decimal 204 already. So there's
something wrong, but I don't have a clue. Is it the array ref number
that was transformed, instead of the set of characters?
In that case, how would I get my @data passed to unpack? (Please
don't tell me there isn't a way without iterating line-by-line).

Any ideas/comments/help is appreciated.
TIA,

Henry
 
B

Bart Lateur

Henry said:
Is there a way to read it and obtain the ANSI extended charset hexadecimal _numbers_?
For instance, the above line would display:

3B C2 4F etc...

unpack "H*", $string;

will do it directly, but it won't have those spaces between the bytes.
Oh, and it's in lower case too.

Perhaps try this:

$hex = join " ", map {sprintf "%02X", $_ } unpack 'C*', $string;

You can process the file in chunks at a time.
 
S

strangeman

Because of deep computational truths, I had to separate the number with _.
This allowed me to pack and unpack precisely.
Thanks a lot.

Henry
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,776
Messages
2,569,603
Members
45,189
Latest member
CryptoTaxSoftware

Latest Threads

Top