LibXML and the British pound

Vic Russell · Oct 14, 2003

Hi,

I'm trying to get the British pound (£) symbol through LibXML and just get
"£" instead. Any ideas?

Vic

Julian F. Reschke · Oct 14, 2003

Vic Russell said:
Hi,

I'm trying to get the British pound (£) symbol through LibXML and just get
"£" instead. Any ideas?

It's the same thing. Where's the problem here?

arachno · Oct 14, 2003

<skip:sarcasm>maybe problem is that it's not british enough
=)</skip:sarcasm>

Vic Russell · Oct 14, 2003

Euroverthetop there but thanks

Vic Russell · Oct 14, 2003

I know this is the code, but I get "£" coming out literally in the
output.

Johannes Koch · Oct 14, 2003

Vic said:
I know this is the code, but I get "£" coming out literally in the
output.

If you really get "£" in the output (code), everthing's fine. If
you get "&#163;", there is something wrong.

Richard Tobin · Oct 14, 2003

Vic Russell said:
I know this is the code, but I get "£" coming out literally in the
output.

If your output is XML, that doesn't matter.

If your output *isn't* XML, your probably outputting it the wrong way!

-- Richard

Patrick TJ McPhee · Oct 14, 2003

% I know this is the code, but I get "£" coming out literally in the
% output.

Leaving aside whether you should care or not, have you set the encoding
to iso-8859-1? If you use the default (utf-8), you won't get a single
character in any case.

Julian F. Reschke · Oct 14, 2003

Patrick TJ McPhee said:
% I know this is the code, but I get "£" coming out literally in the
% output.

Leaving aside whether you should care or not, have you set the encoding
to iso-8859-1? If you use the default (utf-8), you won't get a single
character in any case.

He will still get a single *character*. However, it will be represented by
more than one *byte*.

Patrick TJ McPhee · Oct 15, 2003

% % > In article <[email protected]>,
% >
% > % I know this is the code, but I get "£" coming out literally in the
% > % output.
% >
% > Leaving aside whether you should care or not, have you set the encoding
% > to iso-8859-1? If you use the default (utf-8), you won't get a single
% > character in any case.
%
% He will still get a single *character*. However, it will be represented by
% more than one *byte*.

Depends what you mean by a byte. Why not call it an octet if you want
to be anal about it?

Julian F. Reschke · Oct 15, 2003

Patrick TJ McPhee said:
% % > In article <[email protected]>,
% >
% > % I know this is the code, but I get "£" coming out literally in the
% > % output.
% >
% > Leaving aside whether you should care or not, have you set the encoding
% > to iso-8859-1? If you use the default (utf-8), you won't get a single
% > character in any case.
%
% He will still get a single *character*. However, it will be represented by
% more than one *byte*.

Depends what you mean by a byte. Why not call it an octet if you want
to be anal about it?

I could.

The issue is that a lot of the confusion about encodings is caused because
people do not grasp that a character is not a byte or an octet. Thus it
makes a lot of sense to get people to use the right terminology.

Vic Russell · Oct 15, 2003

Fixed it!! Thanks for all the clues.

I used:-
$xmldoc->setEncoding('UTF-8');

with LibXML and the good old British pound (£) symbol came out a treat.
There is life in the old dog yet!

Thanks again,

Vic

Patrick TJ McPhee · Oct 16, 2003

% The issue is that a lot of the confusion about encodings is caused because
% people do not grasp that a character is not a byte or an octet.

Having dealt with people who are confused about encodings for more than
a decade, I'd say the confusion about encodings is caused by the almost
infinite and unquestionably needless variety of the things. The amount
of storage space used to represent each code point has nothing at all
to do with it.

As for using the `correct' words, I'm not convinced that imposing an
orthodoxy that goes against wide-spread usage is a good way to promote
understanding. A lot of meaning in English depends on context, and it's
irritating when people go around `correcting' completely unambiguous
statements because they think a word has some sacred meaning.

Julian F. Reschke · Oct 16, 2003

Patrick TJ McPhee said:
% The issue is that a lot of the confusion about encodings is caused because
% people do not grasp that a character is not a byte or an octet.

Having dealt with people who are confused about encodings for more than
a decade, I'd say the confusion about encodings is caused by the almost
infinite and unquestionably needless variety of the things. The amount
of storage space used to represent each code point has nothing at all
to do with it.

As for using the `correct' words, I'm not convinced that imposing an
orthodoxy that goes against wide-spread usage is a good way to promote
understanding. A lot of meaning in English depends on context, and it's
irritating when people go around `correcting' completely unambiguous
statements because they think a word has some sacred meaning.

Well, I have to disagree.

A character is not the same thing as a byte or a octet. It may be *encoded*
as one byte.

Most of the time when people are surprised by how UTF-8 works they think
they see multiple characters. However, what they really see is a single
character that is encoded into multiple bytes. The confusion is caused by
using the wrong tool to look at the byte stream (for instance an editor that
doesn't handle UTF-8), or using the right tool, but the encoding
meta-information was lost (such as when UTF-8 encoded content is sent to a
browser, but the content-type is wrong).

Do you really think that trying to explain this won't help?

Patrick TJ McPhee · Oct 17, 2003

[...]

% Do you really think that trying to explain this won't help?

It really depends on the context. Most of the time, it doesn't, because
an unsurprisingly large part of the population doesn't much care. For
instance, without waiting to see how this thread turned out, the OP
has evidently solved his problem and got on with his life.

Learning XML::LibXML::XPathContext	0	Jul 16, 2012
libxml and C++	0	May 15, 2012
Dramatic performance gains with Libxml	2	Sep 16, 2012
string and pound symbol	3	Feb 5, 2006
Programming webshop	1	Oct 28, 2020
PDF::Reuse & UK currency 'pound' symbol	1	Nov 29, 2007
raw_input can't handle pound sign?	2	Dec 27, 2008
Nokogiri and LibXML	11	Sep 8, 2010

LibXML and the British pound

Vic Russell

Julian F. Reschke

arachno

Vic Russell

Vic Russell

Johannes Koch

Richard Tobin

Patrick TJ McPhee

Julian F. Reschke

Patrick TJ McPhee

Julian F. Reschke

Vic Russell

Patrick TJ McPhee

Julian F. Reschke

Patrick TJ McPhee

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads