Erik Sandblom said:
i artikel
[email protected], skrev Ben Morrow på
(e-mail address removed) den 04-02-12 21.45:
Yes, that's right. I'm finally getting the hang of hexadecimal and I've
deduced 16-bit comes from that 2 to the power of four is 16. But what does
that really mean, considering each "digit", as I still call them, can have
16 different numbers, and not 2? That would be 16 to the power of four which
is a large number, about 66 000 unless I'm mistaken.

Not quite. 'Bit's refer to the binary representation (base 2, as
hex is base 16) of a number. A 2-digit hex number, say 0x82, can also
be written as an 8-digit binary number (an 8-bit number: 'bit' is
short for 'binary digit'): 0b1000_0010. The 0x here indicates hex, and
the 0b binary; the _s are just put in to make the number easier to
read.
Hexadecimal has 16 different digits, binary but 2; and as you say, 2^4
= 16, so each hex digit represents 4 binary digits. Thus a four-digit
hex number is a 4*4 = 16-bit binary number: as you say, there are
65536 of them.
You can get Perl to print out the decimal, hex and binary
representations of a number using sprintf with the %d, %x and %b
formats.
Oh my goodness, that's a lot of characters. Why doesn't everyone just learn
English? ;-)
It is indeed a lot... most of them are unused at present, but they had
just too many with all the Chinese-Japanese-Korean ideograms and all
the Arabic ligatures to fit into 16 bits.
Well, what I've done is used latin-1 literals and saved the file in latin-1
encoding. Then I have used utf8 codes like \x{201D} to represent utf8
characters. I've written "use bytes" at the top of my perl script. Forgive
my ignorance but how would it behave differently with "use encoding latin1"
at the top?
'use bytes' disables Perl's Unicode support, and makes it treat all
strings as sequences of 8-bit bytes. When 'use bytes' is not in
effect, strings can be thought of as sequences of 21-bit numbers (in
fact, the representation is more compact than that, which occasionally
'leaks through' when things go wrong).
Under 'use bytes', you are declaring that your data is 'binary' as
opposed to 'textual'. The fact that if you treat it as textual Perl
will pretend it's Latin1 is for backwards compatibility only: I would
say that 'strictly' speaking Perl ought to give an error if you try
and use characters outside of ASCII (but then, Perl didn't get where
it is today by being strict about things

. In fact, under 'use
bytes', even if you state some data is textual by pushing an :encoding
layer onto the filehandle, Perl will still treat the data as 8-bit
bytes; which is one of the ways the underlying representation can
'leak through' as I mentioned above.
'use encoding 'latin1'' *just* declares that your source file is in
Latin1. It doesn't affect how Perl views your data at all: data that
comes from a filehandle marked with :raw will be considered to be
'binary', ie. a sequence of 8-bit bytes; and data which comes from a
filehandle marked with :encoding will be considered to be 'textual',
i.e. a sequence of 21-bit Unicode codepoints.
This is all a little confusing: you may need to think about it a bit
before it sinks in. I know I did...
References: perldoc perluniintro, perldoc perlunicode, unicode.org,
perldoc PerlIO, perldoc PerlIO::encoding.
Ben