UTF-8 encoding problem

Discussion in 'C++' started by shreshth.luthra@gmail.com, Oct 18, 2006.

  1. Guest

    Hi All,

    I am having a GUI which accepts a Unicode string and searches a given
    set of xml files for that string.

    Now, i have 2 XML files both of them saved in UTF-8 format, having
    characters of different language.

    Although both of them are having UTF-8 as BoM, but only first file is
    having UTF-8 defined in XML declration at the top of the XML file as
    well.

    Now, when i search for some different langauge character in that
    directory using a third party GUI for desktop search, it shows that the
    charcter exist in the first file (in which XML declation was also
    there), but not in the second file (having only BoM)

    Initilally i thought that the problem is mainly because of UTF-8 being
    supporting both MultiBye and Unicode, but could not find much on it.

    Please help.

    Regards,
    Shreshth
    , Oct 18, 2006
    #1
    1. Advertising

  2. Ron Natalie Guest

    wrote:

    >
    > Initilally i thought that the problem is mainly because of UTF-8 being
    > supporting both MultiBye and Unicode, but could not find much on it.
    >
    >

    What does this have to do with C++ at all?
    UTF-8 is a multibyte encoding of the Unicode (which effectively
    is a 32 bit character space) but I doubt that's your problem.
    Your problem is your document isn't conforming with the document
    rules that the search program is using.
    Ron Natalie, Oct 18, 2006
    #2
    1. Advertising

  3. Guest

    I know this has nothing to do with C++ in particular but where better
    to ask such a question.

    Anyways,
    >your problem is your document isn't conforming with the document
    > rules that the search program is using.


    I am not able to understand what you are trying to say by this.
    Ofcourse i cannot do anything about the Search Program (Which is for
    sure using Unicode)

    But the question is that if both the file are in UTF-8 format why is it
    (search program) working only for the one having UTF-8 in its XML
    declaration as well.
    Does it really make any difference in this regard.

    Thanks for your reply.

    Shreshth


    Ron Natalie wrote:
    > wrote:
    >
    > >
    > > Initilally i thought that the problem is mainly because of UTF-8 being
    > > supporting both MultiBye and Unicode, but could not find much on it.
    > >
    > >

    > What does this have to do with C++ at all?
    > UTF-8 is a multibyte encoding of the Unicode (which effectively
    > is a 32 bit character space) but I doubt that's your problem.
    > Your problem is your document isn't conforming with the document
    > rules that the search program is using.
    , Oct 18, 2006
    #3
  4. loufoque Guest

    wrote:

    > Although both of them are having UTF-8 as BoM, but only first file is
    > having UTF-8 defined in XML declration at the top of the XML file as
    > well.


    BOMs are quite useless for UTF-8. They're nothing but facultative.
    And according to the XML spec (AFAIK), the default encoding when no
    encoding is declared is UTF-8.


    > Now, when i search for some different langauge character in that
    > directory using a third party GUI for desktop search, it shows that the
    > charcter exist in the first file (in which XML declation was also
    > there), but not in the second file (having only BoM)


    OK, so you have a problem with your broken third party application.
    How is that related with C++?


    > Initilally i thought that the problem is mainly because of UTF-8 being
    > supporting both MultiBye and Unicode, but could not find much on it.


    Like most of your message, what you say just doesn't make much sense.


    > Please help.


    Getting a basic understanding of what Unicode and its encoding formats
    are would surely help.
    loufoque, Oct 18, 2006
    #4
  5. loufoque Guest

    Ron Natalie wrote:

    > the Unicode (which effectively
    > is a 32 bit character space)


    Unicode only reserves 2^20 + 2^16 mappings.
    21 bits is more than enough to store that.
    loufoque, Oct 18, 2006
    #5
  6. wrote:
    > I know this has nothing to do with C++ in particular but where better
    > to ask such a question.


    The statement above is the best I have seen in a long time here.

    If you know your question have "nothing to do with C++ in particular"
    then why do you ask in a newsgroup dedicated to the C++ language? That
    is like asking for help with you car in a bicycle shop.

    You will probably get much better response if you ask in a forum
    dedicated to your problem.

    Sincerely,

    Peter Jansson
    http://www.p-jansson.com/
    http://www.jansson.net/
    Peter Jansson, Oct 18, 2006
    #6
  7. Bhushan Guest

    Check your 3rd party search tool documentation about how it searches
    XML files.


    wrote:
    > I know this has nothing to do with C++ in particular but where better
    > to ask such a question.
    >
    > Anyways,
    > >your problem is your document isn't conforming with the document
    > > rules that the search program is using.

    >
    > I am not able to understand what you are trying to say by this.
    > Ofcourse i cannot do anything about the Search Program (Which is for
    > sure using Unicode)
    >
    > But the question is that if both the file are in UTF-8 format why is it
    > (search program) working only for the one having UTF-8 in its XML
    > declaration as well.
    > Does it really make any difference in this regard.
    >
    > Thanks for your reply.
    >
    > Shreshth
    >
    >
    > Ron Natalie wrote:
    > > wrote:
    > >
    > > >
    > > > Initilally i thought that the problem is mainly because of UTF-8 being
    > > > supporting both MultiBye and Unicode, but could not find much on it.
    > > >
    > > >

    > > What does this have to do with C++ at all?
    > > UTF-8 is a multibyte encoding of the Unicode (which effectively
    > > is a 32 bit character space) but I doubt that's your problem.
    > > Your problem is your document isn't conforming with the document
    > > rules that the search program is using.
    Bhushan, Oct 19, 2006
    #7
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. JJBW
    Replies:
    1
    Views:
    10,045
    Joerg Jooss
    Apr 24, 2004
  2. Replies:
    1
    Views:
    288
    Chris Uppal
    Oct 18, 2006
  3. Replies:
    1
    Views:
    444
    Richard Tobin
    Oct 18, 2006
  4. Replies:
    2
    Views:
    353
  5. Replies:
    2
    Views:
    365
    Nathan Keel
    Aug 14, 2009
Loading...

Share This Page