Sorting EBCDIC

C

Chris Hamel

Greetings.

I am currently dealing with a dataset that is purely numeric, and some
time next year the data will be migrated to alphanumeric. The data is
structured such that the highest values represent the oldest entity
(reverse sort, essentially). This is signficiant to us when we process
the data in our system because there are times we need to sort the
orders from oldest to newest so we allocate them consistent with how
the source system allocates them.

Here's the rub: the source system is an IBM mainframe that is
assigning the values backwards in EBCDIC. Our system is on an AIX
server.

So, if I'm looking at four orders:
1 2 A B

The mainframe has created them in this order:
2 1 A B

But a descending sort in Perl will put them in this order:
B A 2 1

I looked in the Perl documentation, which basically suggests that I: 1)
get over it (not possible) 2) sort in only one system (not possible) or
3) convert, sort, re-convert. My only option of these is the third.

I'm thinking of something like this:

tr/0-9A-Za-z/a-zA-Z0-9/ for @list_of_orders;
@list_of_orders = sort @list_of_orders;
tr/a-zA-Z0-9/0-9A-Za-z/ for @list_of_orders;

This is conceptual, as the actual code will obviously have a lot more
gunk around it and resides in several programs. Am I on track? Are
there better ways? Has anyone done this before? Am I creating
efficiency problems that I'm not aware of? Is there a module in CPAN
that does this better that has managed to hide from my searches?

I'm not opposed to using a system call to the Unix sort if it has
advantages. Memory will not be an issue.

I am still in the conceptual design phase but am looking for any
feedback. We have several months before we have to do this, but before
I go off and start changing dozens of programs I wanted to see if
anyone had any feedback.

Thanks in advance,
Chris
 
U

Uri Guttman

CH> Here's the rub: the source system is an IBM mainframe that is
CH> assigning the values backwards in EBCDIC. Our system is on an AIX
CH> server.

CH> So, if I'm looking at four orders:
CH> 1 2 A B

CH> The mainframe has created them in this order:
CH> 2 1 A B

CH> But a descending sort in Perl will put them in this order:
CH> B A 2 1

CH> I looked in the Perl documentation, which basically suggests that I: 1)
CH> get over it (not possible) 2) sort in only one system (not possible) or
CH> 3) convert, sort, re-convert. My only option of these is the third.

CH> I'm thinking of something like this:

CH> tr/0-9A-Za-z/a-zA-Z0-9/ for @list_of_orders;
CH> @list_of_orders = sort @list_of_orders;
CH> tr/a-zA-Z0-9/0-9A-Za-z/ for @list_of_orders;

use the unix dd utility to do ebcdic/ascii conversion. you ain't gonna
easily get it right with tr///. there might be a cpan module for this
too and should be easy to find by searching for ebcdic

CH> This is conceptual, as the actual code will obviously have a lot
CH> more gunk around it and resides in several programs. Am I on
CH> track? Are there better ways? Has anyone done this before? Am I
CH> creating efficiency problems that I'm not aware of? Is there a
CH> module in CPAN that does this better that has managed to hide from
CH> my searches?

look at Sort::Maker which may save you a lot of work. you can describe
your sort keys and forward/reverse, do conversion one time (it will
return the sorted original data) and you could possibly do the
conversion on the fly if you find a module for it.

CH> I am still in the conceptual design phase but am looking for any
CH> feedback. We have several months before we have to do this, but before
CH> I go off and start changing dozens of programs I wanted to see if
CH> anyone had any feedback.

doesn't sound like too complex a project. first get a proper
specification of your sort keys (including any conversion). then code it
up in sort::maker and you should be done.

uri
 
M

Mumia W. (on aioe)

Greetings.

I am currently dealing with a dataset that is purely numeric, and some
time next year the data will be migrated to alphanumeric. The data is
structured such that the highest values represent the oldest entity
(reverse sort, essentially). This is signficiant to us when we process
the data in our system because there are times we need to sort the
orders from oldest to newest so we allocate them consistent with how
the source system allocates them.

Here's the rub: the source system is an IBM mainframe that is
assigning the values backwards in EBCDIC. Our system is on an AIX
server.

So, if I'm looking at four orders:
1 2 A B

The mainframe has created them in this order:
2 1 A B

But a descending sort in Perl will put them in this order:
B A 2 1

I looked in the Perl documentation, which basically suggests that I: 1)
get over it (not possible) 2) sort in only one system (not possible) or
3) convert, sort, re-convert. My only option of these is the third.

I'm thinking of something like this:

tr/0-9A-Za-z/a-zA-Z0-9/ for @list_of_orders;
@list_of_orders = sort @list_of_orders;
tr/a-zA-Z0-9/0-9A-Za-z/ for @list_of_orders;

This is conceptual, as the actual code will obviously have a lot more
gunk around it and resides in several programs. Am I on track? Are
there better ways? Has anyone done this before? Am I creating
efficiency problems that I'm not aware of? Is there a module in CPAN
that does this better that has managed to hide from my searches?

I'm not opposed to using a system call to the Unix sort if it has
advantages. Memory will not be an issue.

I am still in the conceptual design phase but am looking for any
feedback. We have several months before we have to do this, but before
I go off and start changing dozens of programs I wanted to see if
anyone had any feedback.

Thanks in advance,
Chris

I don't have a solid grasp of what you're trying to do, but I think you
want to convert/decode those values from EBCDIC before doing the
comparisons.

Just in case I'm wrong, I decided to do sorting on "1 2 A B" three
different ways:


use strict;
use warnings;
no warnings 'once';

use Encode::EBCDIC ();
use Sort::Maker;

my @orders = qw(A 1 2 B);
my @sorts;
push @sorts, [ sort @orders ];
push @sorts, [ make_sorter(
ST => string => sub { Encode::EBCDIC::decode("posix-bc",$_) },
)->(@orders) ];

my @ocopy = @orders;
tr/0-9A-Za-z/a-zA-Z0-9/ for (@ocopy);
@ocopy = sort @ocopy;
tr/a-zA-Z0-9/0-9A-Za-z/ for (@ocopy);
push @sorts, [ @ocopy ];

print "@{$sorts[0]} (ascii)\n";
print "@{$sorts[1]} (ebcdic)\n";
print "@{$sorts[2]} (tr//)\n";

__END__

The output is this:

1 2 A B (ascii)
2 1 A B (ebcdic)
1 2 A B (tr//)

I think you want to convert to or from EBCDIC, but I can't figure out
which it is--to or from :)

Nonetheless, this has to be close to what you're trying to achieve.

These modules might come in handy:

http://search.cpan.org/search?query=Encode::EBCDIC&mode=module
http://search.cpan.org/search?query=Sort::Maker&mode=module
 
B

Ben Morrow

Quoth Uri Guttman said:
CH> Here's the rub: the source system is an IBM mainframe that is
CH> assigning the values backwards in EBCDIC. Our system is on an AIX
CH> server.

CH> So, if I'm looking at four orders:
CH> 1 2 A B

CH> The mainframe has created them in this order:
CH> 2 1 A B

CH> But a descending sort in Perl will put them in this order:
CH> B A 2 1

use the unix dd utility to do ebcdic/ascii conversion. you ain't gonna
easily get it right with tr///. there might be a cpan module for this
too and should be easy to find by searching for ebcdic

Err... Encode? :)

AFAI understand your problem, you have a set of single-character
strings, and you want to sort them by EBCDIC code point. Something like
this should work (obviously, you will need to adjust for your code page):

#!/usr/bin/perl

use strict;
use warnings;

use Encode;
use Sort::Maker qw/make_sorter/;

my @ids = qw/1 2 A B/;

my $sorter = make_sorter
plain =>
number => {
code => q{
ord Encode::encode cp1047 => $_;
},
descending => 1,
},
or die;

print for $sorter->(@ids);

__END__

This actually prints
2
1
B
A
rather than 2 1 A B: was this a mistake above?

[Uri: why is it necessary to say C<Encode::encode>? Plain C<encode>
doesn't work, even if encode is imported. Is this just an artefact of
the fact that make_sorter uses eval ""?]

Ben
 
U

Uri Guttman

BM> my $sorter = make_sorter
BM> plain =>
BM> number => {
BM> code => q{
BM> ord Encode::encode cp1047 => $_;
BM> },
BM> descending => 1,
BM> },

BM> [Uri: why is it necessary to say C<Encode::encode>? Plain C<encode>
BM> doesn't work, even if encode is imported. Is this just an artefact of
BM> the fact that make_sorter uses eval ""?]

it is the same issue that mumia discovered in clp.modules this week. the
eval is in a different module and encode() wasn't exported there so it
isn't seen. the fully qualified name works fine there. with the addition
of support for closures (being worked on now) i would expect you could use just
encode() since that closure will be compiled in the current context
which will see encode(). currently code refs in sort::maker are deparsed
and that text is used in building the sorter which is then evaled.
and there are other ways to work around that like with the init_code
option which could have a use Encode line and then the deparse/evaled
code would see encode(). init_code was created for these special cases
where you want to do something inside the sorter sub itself.

uri
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads

Check forms With JavaScript 1
Sorting 3
Converting EBCDIC to Unicode 3
EBCDIC <--> ASCII 4
hex to ebcdic 5
Collect Excel Data from Website 5
RegEx 0
Regular expression syntax error 1

Members online

No members online now.

Forum statistics

Threads
473,770
Messages
2,569,583
Members
45,074
Latest member
StanleyFra

Latest Threads

Top