utf-8 to ascii

M

mail2atulmehta

I have a question. how to generate two files, one in UTF-8, the other
in ASCII with the same column length SO that when i do the conversion
from utf-8 to ascii, the column length does not change . any help is
appreciated thanks
 
R

Richard Bos

I have a question. how to generate two files, one in UTF-8, the other
in ASCII with the same column length SO that when i do the conversion
from utf-8 to ascii, the column length does not change .

Depends. What is "column length" in UTF? Is it the number of UTF-encoded
characters, the number of characters in that encoding, or something else
again? Note that for non-ASCII characters, the first count is smaller
than the second. Also, what are you going to do with those characters?
How will you map U0641 to ASCII?

(Note that a strict interpretation of what you wrote would result in a
trivial implementation: if a UTF-encoded character is not ASCII, it
cannot be converted to ASCII, so the whole conversion fails because of
malformed input - but if all input _is_ ASCII, then it has the same
encoding in UTF-8 as in ASCII in the first place, and no conversion is
necessary. This is not likely to be an acceptable solution ;-) )

Richard
 
S

SM Ryan

# I have a question. how to generate two files, one in UTF-8, the other
# in ASCII with the same column length SO that when i do the conversion
# from utf-8 to ascii, the column length does not change . any help is
# appreciated thanks

If you're restricting yourself to the ASCII codes x01 through x7E, the
UTF-8 and ASCII are identical. x00 is sometimes remapped to an unused unicode
character and I don't remember if x7F is the same in both.
 
C

Clark S. Cox III

# I have a question. how to generate two files, one in UTF-8, the other
# in ASCII with the same column length SO that when i do the conversion
# from utf-8 to ascii, the column length does not change . any help is
# appreciated thanks

If you're restricting yourself to the ASCII codes x01 through x7E, the
UTF-8 and ASCII are identical. x00 is sometimes remapped to an unused unicode
character and I don't remember if x7F is the same in both.

<pedantic>
By defintion, UTF-8 and ASCII are identical in the range [0, 0x7F],
period. No exception for 0x00 or 0x7F.
</pedantic>
 
V

Villy Kruse

I have a question. how to generate two files, one in UTF-8, the other
in ASCII with the same column length SO that when i do the conversion
from utf-8 to ascii, the column length does not change . any help is
appreciated thanks

To closest ANSI comes to this issue is the mbtowc() and related functions.
However, a multibyte character may be utf-8 but it could also be something else.
And a wide character could be unicode or it may not be. In the interval
0x00 through 0xff the unicode value of a character is identical to the
iso-8859-1 value.

The bottom line is that the OS or some third party library may provide
the required conversion functions.

Villy
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads

Batch Convert HTML to UTF-8 Files 2
Unicode (UTF-8) in C 13
UTF-8 vs w_char 48
UTF-8 read & print? 6
utf-8 and ascii 3
utf-8 and ctypes 5
Ruby 1.9 - US-ASCII vs UTF-8 2
US-ASCII to UTF-8 2

Members online

No members online now.

Forum statistics

Threads
474,432
Messages
2,571,681
Members
48,796
Latest member
Greg L.

Latest Threads

Top