A question about Charsets

  • Thread starter Eric-Roger Bruecklmeier
  • Start date
E

Eric-Roger Bruecklmeier

Hello Rubyists,

in a new application i have to read dBase III Files which were generated
in a DOS enviroment. How can i convert the Data into Windows codepages
from ruby?

Thanks for any hints.

Eric.
 
J

Josef 'Jupp' SCHUGT

Hi!

* Eric-Roger Bruecklmeier; 2003-11-20, 20:30 UTC:
in a new application i have to read dBase III Files which were generated
in a DOS enviroment. How can i convert the Data into Windows codepages
from ruby?

Map each Byte to the corresponding one using a hash. You need the
codepages.

DOS codepages are listed here:

http://dwd.da.ru/charsets/index.html#dos-specific

Windows codepages are listed here:

http://dwd.da.ru/charsets/index.html#windows-specific

The mapping is troublesome because of two reasons: First of all all
DOS characters have Windows standard codepage counterparts (greek
letters for example) and 0..31 can be either control chars or
pictograms.

So the best you can do is use the above tables and create Arrays or
hashes that do the mapping.

For cp850 and cp866 you can use iconv, otherwise you can use recode.
This can be done from Ruby but it requires the appropriate software
being in place. Bad if you want software to be portable.

Josef 'Jupp' Schugt
 
E

Eric-Roger Bruecklmeier

Josef said:
Map each Byte to the corresponding one using a hash. You need the
codepages.

That's the way i do it now, but it's slow :-(
For cp850 and cp866 you can use iconv, otherwise you can use recode.
This can be done from Ruby but it requires the appropriate software
being in place. Bad if you want software to be portable.

Exactly that's the problem, the software has to be portable :-(

Thanks anyhow!

C YA

Eric.
 
J

Josef 'Jupp' SCHUGT

Hi!

* Eric-Roger Bruecklmeier; 2003-11-21, 13:01 UTC:
That's the way i do it now, but it's slow :-(

When I find my code in tons of trouble, friends and collegues come to
me, speaking words of wisdom: write in C. (Sung to: 'Let it be' by
the Beatles).

Speedup calls for a C extension. I'll skip the 'intro to C
extensions' stuff (Thomas and Hunt have that) and directly go to the
implementation of the mapping algorithm.

Suppose s points to array of char to be converted. Suppose you simply
need to map code 0 to 1 and vice versa. In that case use this:

for (p = s; *p; p++) {
switch (*p) {
case 0: *p = 1; break;
case 1: *p = 0; break;
}
}

You don't need to map any char in the ASCII printable range which
saves a lot of coding. The resulting code is extremely fast.
Exactly that's the problem, the software has to be portable :-(

The above code is extremely portable. An additional advantage: You
can give the codes in decimal or hexadecimal values.

For 16 Bit codes things are more complicated. You then need

for (p = s; *p; p+=2) {
switch (*p << 8 + *(p+1)) { /* or the other way round, depends */
case 0: *p = 1; break;
case 1: *p = 0; break;
}
}

and lots of additional cases.

Viel Erfolg,

Josef 'Jupp' Schugt
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top