UTF-8 encoding problem

S

shreshth.luthra

Hi All,

I am having a GUI which accepts a Unicode string and searches a given
set of xml files for that string.

Now, i have 2 XML files both of them saved in UTF-8 format, having
characters of different language.

Although both of them are having UTF-8 as BoM, but only first file is
having UTF-8 defined in XML declration at the top of the XML file as
well.

Now, when i search for some different langauge character in that
directory using a third party GUI for desktop search, it shows that the
charcter exist in the first file (in which XML declation was also
there), but not in the second file (having only BoM)

Initilally i thought that the problem is mainly because of UTF-8 being
supporting both MultiBye and Unicode, but could not find much on it.

Please help.

Regards,
Shreshth
 
R

Ron Natalie

Initilally i thought that the problem is mainly because of UTF-8 being
supporting both MultiBye and Unicode, but could not find much on it.
What does this have to do with C++ at all?
UTF-8 is a multibyte encoding of the Unicode (which effectively
is a 32 bit character space) but I doubt that's your problem.
Your problem is your document isn't conforming with the document
rules that the search program is using.
 
S

shreshth.luthra

I know this has nothing to do with C++ in particular but where better
to ask such a question.

Anyways,
your problem is your document isn't conforming with the document
rules that the search program is using.

I am not able to understand what you are trying to say by this.
Ofcourse i cannot do anything about the Search Program (Which is for
sure using Unicode)

But the question is that if both the file are in UTF-8 format why is it
(search program) working only for the one having UTF-8 in its XML
declaration as well.
Does it really make any difference in this regard.

Thanks for your reply.

Shreshth
 
L

loufoque

Although both of them are having UTF-8 as BoM, but only first file is
having UTF-8 defined in XML declration at the top of the XML file as
well.

BOMs are quite useless for UTF-8. They're nothing but facultative.
And according to the XML spec (AFAIK), the default encoding when no
encoding is declared is UTF-8.

Now, when i search for some different langauge character in that
directory using a third party GUI for desktop search, it shows that the
charcter exist in the first file (in which XML declation was also
there), but not in the second file (having only BoM)

OK, so you have a problem with your broken third party application.
How is that related with C++?

Initilally i thought that the problem is mainly because of UTF-8 being
supporting both MultiBye and Unicode, but could not find much on it.

Like most of your message, what you say just doesn't make much sense.

Please help.

Getting a basic understanding of what Unicode and its encoding formats
are would surely help.
 
L

loufoque

Ron said:
the Unicode (which effectively
is a 32 bit character space)

Unicode only reserves 2^20 + 2^16 mappings.
21 bits is more than enough to store that.
 
P

Peter Jansson

I know this has nothing to do with C++ in particular but where better
to ask such a question.

The statement above is the best I have seen in a long time here.

If you know your question have "nothing to do with C++ in particular"
then why do you ask in a newsgroup dedicated to the C++ language? That
is like asking for help with you car in a bicycle shop.

You will probably get much better response if you ask in a forum
dedicated to your problem.

Sincerely,

Peter Jansson
http://www.p-jansson.com/
http://www.jansson.net/
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Members online

No members online now.

Forum statistics

Threads
473,764
Messages
2,569,565
Members
45,041
Latest member
RomeoFarnh

Latest Threads

Top