Fastest Hex to Ascii routine

M

Mark H

I have been beating myself over the head looking for a faster hex to
ascii routine. I've scoured the Internet for 3 hours now and have
found nothing that even remotely holds up on megabytes of hex to ascii
conversion. Here's what I have so far:
for (my $i = 0; $i < length($file_raw_hex); $i += 2)
{
$file_raw.=pack('H2', substr($file_raw_hex, $i, 2));
}

This is the slowest, coming in at about 2 seconds per meg on a 2.0 Ghz
P4.

Then this is slightly faster:
$file_raw_hex =~ s/([a-fA-F0-9]{2})/chr(hex $1)/eg;

Comes in at 1.5 seconds per meg.

But there's got to be something that can do better than this. This is
a modern CPU, on a modern OS (Linux) with fast SCSI disks.... there is
no other bottleneck here. This code is dog slow.

Does anyone have any suggestions? I have been trying to figure out if
Bit::Vector could help but to no avail (Bit::Vector has no ascii
abilities as far as I know - it only converts between
decimal/hex/octal). I would love if someone has a module to suggest
that uses XS code.

Thanks
Mark
 
A

A. Sinan Unur

Mark H said:
I have been beating myself over the head looking for a faster hex to
ascii routine. I've scoured the Internet for 3 hours now and have
found nothing that even remotely holds up on megabytes of hex to ascii
conversion. Here's what I have so far:
for (my $i = 0; $i < length($file_raw_hex); $i += 2)
{
$file_raw.=pack('H2', substr($file_raw_hex, $i, 2));
}

This is the slowest, coming in at about 2 seconds per meg on a 2.0 Ghz
P4.

Then this is slightly faster:
$file_raw_hex =~ s/([a-fA-F0-9]{2})/chr(hex $1)/eg;

Comes in at 1.5 seconds per meg.

But there's got to be something that can do better than this. This is
a modern CPU, on a modern OS (Linux) with fast SCSI disks.... there is
no other bottleneck here. This code is dog slow.

How about line-by-line or block-by-block processing?
Here is something quick'n'dirty:

#!/usr/bin/perl

use strict;
use warnings;

open my $in, '<', $ARGV[0] or die "Cannot open '$ARGV[0]': $!";

my ($data, $buffer);

{
local $,;
while (sysread $in, $buffer, 4096) {
my @lines = split /\n/, $buffer;
@lines = map { s{([[:xdigit:]]{2})}{chr(hex $1)}eg } @lines;
$data .= "@lines";
}
}

__END__

D:\Home\asu1\UseNet\clpmisc\hex> tail -n 3 hexfile
EAFC3885140E9010FFD505127FC20C62F47202C403B9B66F8DC88EC542A0D0888A7522911128B559
BF7E364E624A0651D01BBD4ACFAC813686AF489AC0246DC9CBDFC7D43662AB9D41C3EDEE34AE6DFC
7D402B3CC7D47DF8DF785689AE243A970963E458A6981C20FB81D13F511DF287CDB11F66C0F2A8FE

D:\Home\asu1\UseNet\clpmisc\hex> dir hexfile

02/08/2006 03:52 PM 2,050,000 hexfile

D:\Home\asu1\UseNet\clpmisc\hex> timethis read.pl hexfile

TimeThis : Command Line : read.pl hexfile
TimeThis : Start Time : Wed Feb 08 16:23:35 2006
TimeThis : End Time : Wed Feb 08 16:23:37 2006
TimeThis : Elapsed Time : 00:00:01.859

which translates to a little less than a second per megabyte on my
AMD64 1.8Ghz laptop (running at 800Mhz on batteries) with Win XPSP2.

See what results you get on your system.

And, please, the next time post a complete program that we can run
by copying and pasting.
 
M

Mark H

Hi Sinan,

Thank you for throwing in your hat here to help!

Your program doesn't do what I assume you think it does. Yes, it seems
very fast. But it doesn't actually output in ASCII. It turned my hex
into a series of numbers that made no sense.

Best,
Mark
 
M

Mark H

Not sure if this would make things any faster but our hex data is
already in memory in a $variable with no \n's in it. So splitting
isn't necessary (it's not line-by-line data)... it's just megs of solid
hex.

Mark
 
M

Mark H

Somehow I am having a hard time believing that no XS module exists for
this. It's so simple to write hex to ascii conversion in C and I would
be surprised that no one has invented a simple module to handle this
with great speed...

Best,
Mark
 
E

ednotover

Mark said:
Here's what I have so far:
for (my $i = 0; $i < length($file_raw_hex); $i += 2)
{
$file_raw.=pack('H2', substr($file_raw_hex, $i, 2));
}
$file_raw_hex =~ s/([a-fA-F0-9]{2})/chr(hex $1)/eg;

Why not just pack it all in one fell swoop?

$file_raw = pack 'H*', $file_raw_hex;

Ed
 
A

A. Sinan Unur

[ Please quote an appropriate amount of context when replying ]
....
$file_raw_hex =~ s/([a-fA-F0-9]{2})/chr(hex $1)/eg;
....

#!/usr/bin/perl

use strict;
use warnings;

open my $in, '<', $ARGV[0] or die "Cannot open '$ARGV[0]': $!";

my ($data, $buffer);

{
local $,;
while (sysread $in, $buffer, 4096) {
my @lines = split /\n/, $buffer;
@lines = map { s{([[:xdigit:]]{2})}{chr(hex $1)}eg } @lines;
$data .= "@lines";
}
}

__END__
Your program doesn't do what I assume you think it does.
Yes, it seems very fast. But it doesn't actually output in ASCII.

Well, it depends on what is in your input file. I copied the chr(hex $1)
straight from your code.

Is it possible that you are actually reading a binary file, and what
you are looking for is

perldoc -f ord

I did, however, notice a couple of unintentional bugs in the code I
posted above.

Please post a couple of sample lines from the input file.

Here is what I have (repeated 25,000 times) in the file that I am using:

5468697320697320612074657374202E2E2E205468697320697320612074657374202E2E2E205468

That is, this is a text file, consisting of hex digits. This is consistent
with what you posted.

#!/usr/bin/perl

use strict;
use warnings;

open my $in, '<', $ARGV[0] or die "Cannot open '$ARGV[0]': $!";

my ($data, $buffer);

my $crlf = '\015\012';

while (sysread $in, $buffer, 4096) {
my @lines = split /$crlf/, $buffer;
s{([[:xdigit:]]{2})}{chr(hex $1)}eg for @lines;
$data .= join('', @lines);
}

close $in or die $!;

open my $out, '>', 'ascii' or die $!;
print $out $data, "\n";
close $out or die $!;

__END__
 
M

Mark H

Ed takes the prize on this one. THANK YOU! I don't know why when I go
searching for hex to ascii converters, people for years have been
suggesting all of this other code when Ed's does everything you need it
to do and 100 times the speed (literally!). The processing time per
meg went from 2 seconds or 0.02 seconds.

Is there something I missed about why so many do it other ways?

Best,
Mark
 
M

Mark H

Thank you Jim for your detailed reply on this. I do see some of your
points about this not being a typical operation. But this is what
Perl's best at: The Atypical. I doubted her for a while, convinced
we'd be coding sections in C but she pulled through in the end, as
usual.

Thanks for everyone who helped on this. It's my hope that when the
next person comes along to search for "hex to ascii" perl fastest, this
result will now come up with help.

Mark
 
A

Anno Siegel

Mark H said:
Somehow I am having a hard time believing that no XS module exists for
this. It's so simple to write hex to ascii conversion in C and I would
be surprised that no one has invented a simple module to handle this
with great speed...

"pack 'H*'" is that code, right in the Perl core.

The slowness of your solution comes from splitting the data into
one-byte pieces. Use a reasonable chunk size and it will be fast
enough.

Anno
 
T

Tad McClellan

Mark H said:
It's so simple to write hex to ascii conversion in C


Then write a hex to ascii conversion in C, and your problem is solved!

Unless there is some compelling reason to use a particular
programming language.

Is there such a reason?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,014
Latest member
BiancaFix3

Latest Threads

Top