LibXML and the British pound

V

Vic Russell

Hi,

I'm trying to get the British pound (£) symbol through LibXML and just get
"£" instead. Any ideas?

Vic
 
J

Julian F. Reschke

Vic Russell said:
Hi,

I'm trying to get the British pound (£) symbol through LibXML and just get
"£" instead. Any ideas?

It's the same thing. Where's the problem here?
 
J

Johannes Koch

Vic said:
I know this is the code, but I get "£" coming out literally in the
output.

If you really get "£" in the output (code), everthing's fine. If
you get "£", there is something wrong.
 
R

Richard Tobin

Vic Russell said:
I know this is the code, but I get "£" coming out literally in the
output.

If your output is XML, that doesn't matter.

If your output *isn't* XML, your probably outputting it the wrong way!

-- Richard
 
P

Patrick TJ McPhee

% I know this is the code, but I get "£" coming out literally in the
% output.

Leaving aside whether you should care or not, have you set the encoding
to iso-8859-1? If you use the default (utf-8), you won't get a single
character in any case.
 
J

Julian F. Reschke

Patrick TJ McPhee said:
% I know this is the code, but I get "£" coming out literally in the
% output.

Leaving aside whether you should care or not, have you set the encoding
to iso-8859-1? If you use the default (utf-8), you won't get a single
character in any case.

He will still get a single *character*. However, it will be represented by
more than one *byte*.
 
P

Patrick TJ McPhee

% % > In article <[email protected]>,
% >
% > % I know this is the code, but I get "£" coming out literally in the
% > % output.
% >
% > Leaving aside whether you should care or not, have you set the encoding
% > to iso-8859-1? If you use the default (utf-8), you won't get a single
% > character in any case.
%
% He will still get a single *character*. However, it will be represented by
% more than one *byte*.

Depends what you mean by a byte. Why not call it an octet if you want
to be anal about it?
 
J

Julian F. Reschke

Patrick TJ McPhee said:
% % > In article <[email protected]>,
% >
% > % I know this is the code, but I get "£" coming out literally in the
% > % output.
% >
% > Leaving aside whether you should care or not, have you set the encoding
% > to iso-8859-1? If you use the default (utf-8), you won't get a single
% > character in any case.
%
% He will still get a single *character*. However, it will be represented by
% more than one *byte*.

Depends what you mean by a byte. Why not call it an octet if you want
to be anal about it?

I could.

The issue is that a lot of the confusion about encodings is caused because
people do not grasp that a character is not a byte or an octet. Thus it
makes a lot of sense to get people to use the right terminology.
 
V

Vic Russell

Fixed it!! Thanks for all the clues.

I used:-
$xmldoc->setEncoding('UTF-8');

with LibXML and the good old British pound (£) symbol came out a treat.
There is life in the old dog yet!

Thanks again,

Vic
 
P

Patrick TJ McPhee

% The issue is that a lot of the confusion about encodings is caused because
% people do not grasp that a character is not a byte or an octet.

Having dealt with people who are confused about encodings for more than
a decade, I'd say the confusion about encodings is caused by the almost
infinite and unquestionably needless variety of the things. The amount
of storage space used to represent each code point has nothing at all
to do with it.

As for using the `correct' words, I'm not convinced that imposing an
orthodoxy that goes against wide-spread usage is a good way to promote
understanding. A lot of meaning in English depends on context, and it's
irritating when people go around `correcting' completely unambiguous
statements because they think a word has some sacred meaning.
 
J

Julian F. Reschke

Patrick TJ McPhee said:
% The issue is that a lot of the confusion about encodings is caused because
% people do not grasp that a character is not a byte or an octet.

Having dealt with people who are confused about encodings for more than
a decade, I'd say the confusion about encodings is caused by the almost
infinite and unquestionably needless variety of the things. The amount
of storage space used to represent each code point has nothing at all
to do with it.

As for using the `correct' words, I'm not convinced that imposing an
orthodoxy that goes against wide-spread usage is a good way to promote
understanding. A lot of meaning in English depends on context, and it's
irritating when people go around `correcting' completely unambiguous
statements because they think a word has some sacred meaning.

Well, I have to disagree.

A character is not the same thing as a byte or a octet. It may be *encoded*
as one byte.

Most of the time when people are surprised by how UTF-8 works they think
they see multiple characters. However, what they really see is a single
character that is encoded into multiple bytes. The confusion is caused by
using the wrong tool to look at the byte stream (for instance an editor that
doesn't handle UTF-8), or using the right tool, but the encoding
meta-information was lost (such as when UTF-8 encoded content is sent to a
browser, but the content-type is wrong).

Do you really think that trying to explain this won't help?
 
P

Patrick TJ McPhee

[...]

% Do you really think that trying to explain this won't help?

It really depends on the context. Most of the time, it doesn't, because
an unsurprisingly large part of the population doesn't much care. For
instance, without waiting to see how this thread turned out, the OP
has evidently solved his problem and got on with his life.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,774
Messages
2,569,596
Members
45,143
Latest member
SterlingLa
Top