Question about using SHIFT-JIS encoding with libxml2

Discussion in 'XML' started by saumya.agarwal@gmail.com, Apr 10, 2007.

  1. Guest

    Hi,

    I am using libxml2 for xml parsing. When the client application sends
    data to libxml2 in UTF-8 format, it works fine.

    But, I have a scenarion in which the client application sends data to
    libxml2 parser in SHIFT-JIS format.

    The following error is thrown by libxml2 -

    "Parsing error in results: Input is not proper UTF-8, indicate
    encoding !

    In libxml2 documentation at http://www.xmlsoft.org/encoding.html I
    read that libxml2 can support any encoding by calling the
    xmlSwitchEncoding() routine.
    What do I have to do to make libxml2 support SHIFT-JIS format? I want
    to continue supporting UTF-8 also.


    Thanks,
    Saumya
     
    , Apr 10, 2007
    #1
    1. Advertising

  2. wrote:

    > But, I have a scenarion in which the client application sends data to
    > libxml2 parser in SHIFT-JIS format.
    >
    > The following error is thrown by libxml2 -
    >
    > "Parsing error in results: Input is not proper UTF-8, indicate
    > encoding !


    Does the XML contain an XML declaration indicating the encoding e.g.
    <?xml version="1.0" encoding="SHIFT-JIS"?>

    --

    Martin Honnen
    http://JavaScript.FAQTs.com/
     
    Martin Honnen, Apr 10, 2007
    #2
    1. Advertising

  3. Guest

    > Does the XML contain an XML declaration indicating the encoding e.g.
    > <?xml version="1.0" encoding="SHIFT-JIS"?>


    Yes, it does. I thought that should that be enough to tell the libxml2
    parser that the encoding format is SHIFT-JIS.
    Does libxml2 support SHIFT-JIS encoding ? I want to keep the support
    for UTF-8 intact too. Is it possible?
    Does libxml2 convert SHIFT-JIS to UTF-8 internally if it is supplied
    in XML declaration as above?

    Thanks,
    Saumya

    On Apr 10, 7:20 pm, Martin Honnen <> wrote:
    > wrote:
    > > But, I have a scenarion in which the client application sends data to
    > > libxml2 parser in SHIFT-JIS format.

    >
    > > The following error is thrown by libxml2 -

    >
    > > "Parsing error in results: Input is not proper UTF-8, indicate
    > > encoding !

    >
    > Does the XML contain an XML declaration indicating the encoding e.g.
    > <?xml version="1.0" encoding="SHIFT-JIS"?>
    >
    > --
    >
    > Martin Honnen
    > http://JavaScript.FAQTs.com/
     
    , Apr 11, 2007
    #3
  4. Matej Cepl Guest

    On Tue, 10 Apr 2007 22:13:25 -0700, scripst:
    > Yes, it does. I thought that should that be enough to tell the libxml2
    > parser that the encoding format is SHIFT-JIS. Does libxml2 support
    > SHIFT-JIS encoding ? I want to keep the support for UTF-8 intact too. Is
    > it possible? Does libxml2 convert SHIFT-JIS to UTF-8 internally if it is
    > supplied in XML declaration as above?


    This looks promising (and yes, do read both referenced tutorials)
    http://xmlsoft.org/encoding.html

    Matej
     
    Matej Cepl, Apr 11, 2007
    #4
  5. wrote:
    > Does libxml2 support SHIFT-JIS encoding ?


    I don't know offhand. Find its documentation?

    > Does libxml2 convert SHIFT-JIS to UTF-8 internally if it is supplied
    > in XML declaration as above?


    Most Java-based XML processors actually convert to UTF-16 internally,
    since that's a native character representation in Java. I don't know
    what libxml2 is using, but I would expect they're doing something
    similar -- convert to some standardized internal form, process that,
    then convert back. Some tools have tried to avoid the double conversion
    when data is being passed straight through, but recognizing and taking
    advantage of that optimization is not easy.

    --
    () ASCII Ribbon Campaign | Joe Kesselman
    /\ Stamp out HTML e-mail! | System architexture and kinetic poetry
     
    Joe Kesselman, Apr 11, 2007
    #5
  6. On Apr 11, 7:13 am, ""
    <> wrote:
    > Does libxml2 support SHIFT-JIS encoding ? I want to keep the support
    > for UTF-8 intact too. Is it possible?


    For what it's worth, the source code contains the following (in
    version 2.6.27):

    case XML_CHAR_ENCODING_2022_JP:
    __xmlErrEncoding(ctxt, XML_ERR_UNSUPPORTED_ENCODING,
    "encoding not supported %s\n",
    BAD_CAST "ISO-2022-JP", NULL);
    break;
    case XML_CHAR_ENCODING_SHIFT_JIS:
    __xmlErrEncoding(ctxt, XML_ERR_UNSUPPORTED_ENCODING,
    "encoding not supported %s\n",
    BAD_CAST "Shift_JIS", NULL);
    break;
    case XML_CHAR_ENCODING_EUC_JP:
    __xmlErrEncoding(ctxt, XML_ERR_UNSUPPORTED_ENCODING,
    "encoding not supported %s\n",
    BAD_CAST "EUC-JP", NULL);
    break;
     
    Arndt Jonasson, Apr 12, 2007
    #6
  7. Matej Cepl Guest

    "Arndt Jonasson" <> writes:
    > For what it's worth, the source code contains the following (in
    > version 2.6.27):


    However, according to the webpage (link to which I sent to this
    thread) libxml can use iconv and all its supported codepages
    (i.e., whatever you have even dreamed about).

    Matej
     
    Matej Cepl, Apr 12, 2007
    #7
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Roberto Gallo

    Shift - byte[] buf shift

    Roberto Gallo, Jan 27, 2004, in forum: Java
    Replies:
    3
    Views:
    2,248
    Thomas Schodt
    Jan 27, 2004
  2. PyTJ
    Replies:
    4
    Views:
    8,760
    Jeff Epler
    May 23, 2005
  3. icoba
    Replies:
    0
    Views:
    672
    icoba
    Feb 7, 2006
  4. JIS 2004 support

    , Dec 11, 2006, in forum: ASP .Net
    Replies:
    0
    Views:
    399
  5. Ed Brandmark

    UTF-8 to Shift JIS

    Ed Brandmark, Sep 12, 2003, in forum: Javascript
    Replies:
    4
    Views:
    559
    Ed Brandmark
    Sep 15, 2003
Loading...

Share This Page