Mapforce: mapping to CSV without column header line inserts hex FF FE FF FE

Discussion in 'XML' started by Lukas, Dec 9, 2005.

  1. Lukas

    Lukas Guest

    Hi Group,

    In Mapforce 2005 R3, when mapping to CSV with the "First row contains
    field names" option UN-checked on the CSV target component settings,
    the characters (hex) FF FE FF FE are inserted in the beginning of the
    first line when running Java code autogenerated by Mapforce.

    In the output tab of the Mapforce application, this problem doesn't
    occur. I've not checked whether it occurs when running C#,C++ or XSLT
    autogenerated code.

    I've encountered this problem when mapping XML to CSV and CSV to CSV.

    Does anyone know whether this is this a known bug? Is it fixed in a
    later release?
    Any known workarounds?

    Not holding my breath,

    Lukas
     
    Lukas, Dec 9, 2005
    #1
    1. Advertising

  2. Lukas

    Lukas Guest

    Correction:

    My editor was displaying those bytes incorrectly.
    The bytes inserted are actually:

    EF BB BF
     
    Lukas, Dec 12, 2005
    #2
    1. Advertising

  3. Lukas

    Peter Flynn Guest

    Lukas wrote:

    > Hi Group,
    >
    > In Mapforce 2005 R3, when mapping to CSV with the "First row contains
    > field names" option UN-checked on the CSV target component settings,
    > the characters (hex) FF FE FF FE are inserted in the beginning of the
    > first line when running Java code autogenerated by Mapforce.
    >
    > In the output tab of the Mapforce application, this problem doesn't
    > occur. I've not checked whether it occurs when running C#,C++ or XSLT
    > autogenerated code.
    >
    > I've encountered this problem when mapping XML to CSV and CSV to CSV.
    >
    > Does anyone know whether this is this a known bug? Is it fixed in a
    > later release?
    > Any known workarounds?


    It's not a bug, it's part of XML. It's the Byte Order Mark (BOM) which
    is designed to signal to a processor before processing starts which
    16-bit character encoding is in use. It's being output because your
    processor is emitting UCS-2 which is probably unnecessary unless you
    are using a very wide range of character repertoire planes. Check the
    Mapforce output settings and switch to UTF-8 instead.

    ///Peter
    --
    See FAQ: http://xml.silmaril.ie/appendix/glossary/#bom
     
    Peter Flynn, Dec 12, 2005
    #3
  4. In article <>,
    Lukas <> wrote:

    >My editor was displaying those bytes incorrectly.
    >The bytes inserted are actually:
    >
    >EF BB BF


    I can't help you directly, but EF BB BF is the UTF-8 code for a
    byte-order mark (or "BOM"). Maybe you can look that up in the manual
    for your software.

    -- Richard
     
    Richard Tobin, Dec 13, 2005
    #4
  5. Lukas

    Lukas Guest

    Sorry for the confusion. The sequence was actually EF BB BF (UTF-8 BOM,
    as Richard notes).

    What confuses me about the UTF-8 BOM issue:

    A) In XML: Since I'm using UTF-8, which is a 7 bit encoding, and the
    xml processing instruction says so explicitly, why would I want to have
    nasty binary at the start of my document?

    B)
    * In Text (CSV): some articles claim that Windows Notepad handles the
    BOM gracefully, but in our project the issue would've not even been
    raised if our editors had not displayed spurious characters;
    ... "" (if you view this in ISO 8859-1) in Notepad, a dot in
    Ultraedit 8.2. When switching to hex in Ultraedit, completely wrong
    values are being displayed throug the length of the doc.

    * The issue did not occur when (in Mapforce) the option "First row
    contains field names" was checked for the output CSV, although we
    viewed the output files with the same editors.

    * Mapforce ITSELF doesn't handle the BOM gracefully. If the CSV output
    with BOM from one Mapforce code-gen mapping is fed as input to another,
    the BOM is visible in the first field and trips up functions operating
    on that field.
     
    Lukas, Dec 14, 2005
    #5
  6. Lukas

    Lukas Guest

    Sorry, something doesn't display in my last post. It's meant to read:

    ...

    * * * * * * *
    * * * *
    * * * *
    * * * *
    * * * *
    * * * * *
    * * * ****

    (if you view this in ISO 8859-1) in Notepad, a dot ...
     
    Lukas, Dec 14, 2005
    #6
  7. In article <>,
    Lukas <> wrote:

    >A) In XML: Since I'm using UTF-8, which is a 7 bit encoding, and the
    >xml processing instruction says so explicitly, why would I want to have
    >nasty binary at the start of my document?


    UTF-8 is not a 7-bit encoding! It corresponds to ASCII for characters
    up to 127, but uses bytes with the high bit set to encode the rest of
    Unicode.

    >* In Text (CSV): some articles claim that Windows Notepad handles the
    >BOM gracefully, but in our project the issue would've not even been
    >raised if our editors had not displayed spurious characters;
    >.. "" (if you view this in ISO 8859-1) in Notepad


    I don't know anything about Notepad, but if you see those characters -
    i with diaeresis, double greater-than, inverted question mark - it
    means that the program is interpreting the document as 8859-1 rather
    than UTF-8. Of course, the whole point of the UTF-8 BOM is to let it
    know that it's in UTF-8!

    -- Richard
     
    Richard Tobin, Dec 14, 2005
    #7
  8. Lukas

    Peter Flynn Guest

    Lukas wrote:

    > Sorry for the confusion. The sequence was actually EF BB BF (UTF-8
    > BOM, as Richard notes).
    >
    > What confuses me about the UTF-8 BOM issue:
    >
    > A) In XML: Since I'm using UTF-8, which is a 7 bit encoding,


    Whoah there. UTF-8 uses all 8 bits in the byte. Where did you get the
    information that it's 7-bit? The only 7-bit encoding in widespread
    use is US-ASCII.

    > and the
    > xml processing instruction says so explicitly, why would I want to
    > have nasty binary at the start of my document?


    To identify that it is UTF-8 as opposed to UTF-16 or UTF-32.
    If your XML software can't handle it, it's broken and should be
    replaced.

    > B)
    > * In Text (CSV): some articles claim that Windows Notepad handles the
    > BOM gracefully, but in our project the issue would've not even been
    > raised if our editors had not displayed spurious characters;
    > .. "" (if you view this in ISO 8859-1) in Notepad, a dot in
    > Ultraedit 8.2. When switching to hex in Ultraedit, completely wrong
    > values are being displayed throug the length of the doc.


    While most plaintext editors will display ASCII or ISO-8859-1
    adequately, large numbers of them spit blood when faced with anything
    else. Notepad is suitable for shopping lists and not much else.

    > * The issue did not occur when (in Mapforce) the option "First row
    > contains field names" was checked for the output CSV, although we
    > viewed the output files with the same editors.
    >
    > * Mapforce ITSELF doesn't handle the BOM gracefully. If the CSV output
    > with BOM from one Mapforce code-gen mapping is fed as input to
    > another, the BOM is visible in the first field and trips up functions
    > operating on that field.


    Sounds like Mapforce is broken and you should complain to the vendor.

    ///Peter
    --
    XML FAQ: http://xml.silmaril.ie/
     
    Peter Flynn, Dec 14, 2005
    #8
  9. In <dnp505$kl3$>, on 12/14/2005
    at 12:59 PM, (Richard Tobin) said:

    >I don't know anything about Notepad, but if you see those characters
    >-
    >i with diaeresis, double greater-than, inverted question mark - it
    >means that the program is interpreting the document as 8859-1 rather
    >than UTF-8. Of course, the whole point of the UTF-8 BOM is to let it
    >know that it's in UTF-8!


    Why would you need a BOM for UTF-8? It's only needed for characters
    larger than an octet, e.g., UTF-16, raw UCS4.

    --
    Shmuel (Seymour J.) Metz, SysProg and JOAT <http://patriot.net/~shmuel>

    Unsolicited bulk E-mail subject to legal action. I reserve the
    right to publicly post or ridicule any abusive E-mail. Reply to
    domain Patriot dot net user shmuel+news to contact me. Do not
    reply to
     
    Shmuel (Seymour J.) Metz, Dec 19, 2005
    #9
  10. In article <43a6ba3b$28$fuzhry+tra$>,
    Shmuel (Seymour J.) Metz <> wrote:

    >Why would you need a BOM for UTF-8? It's only needed for characters
    >larger than an octet, e.g., UTF-16, raw UCS4.


    It also serves to indicate the encoding, as well as which byte-order
    variant.

    -- Richard
     
    Richard Tobin, Dec 19, 2005
    #10
  11. In <do76tp$239d$>, on 12/19/2005
    at 08:58 PM, (Richard Tobin) said:

    >It also serves to indicate the encoding, as well as which byte-order
    >variant


    What byte-order variant? UTF-8 uses a stream of 8-bit bytes (octets),
    not a stream of 16-bit bytes; there is no byte ordering issue. The BOM
    is needed for UTF-16 and raw Unicode, not for UTF-8.

    --
    Shmuel (Seymour J.) Metz, SysProg and JOAT <http://patriot.net/~shmuel>

    Unsolicited bulk E-mail subject to legal action. I reserve the
    right to publicly post or ridicule any abusive E-mail. Reply to
    domain Patriot dot net user shmuel+news to contact me. Do not
    reply to
     
    Shmuel (Seymour J.) Metz, Jan 3, 2006
    #11
  12. In article <43ba752d$23$fuzhry+tra$>,
    Shmuel (Seymour J.) Metz <> wrote:

    >>It also serves to indicate the encoding, as well as which byte-order
    >>variant


    >What byte-order variant? UTF-8 uses a stream of 8-bit bytes (octets),
    >not a stream of 16-bit bytes; there is no byte ordering issue.


    The obvious use of a BOM - as the name implies - is to indicate which
    byte order variant of an encoding is being used. It is *also* used to
    indicate the encoding itself. Obviously for UTF-8 only this second
    fuction is relevant.

    -- Richard
     
    Richard Tobin, Jan 4, 2006
    #12
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    10
    Views:
    6,407
    Neredbojias
    Aug 19, 2005
  2. Larry Kim
    Replies:
    0
    Views:
    943
    Larry Kim
    Sep 2, 2003
  3. Michael Herman \(Parallelspace\)
    Replies:
    0
    Views:
    586
    Michael Herman \(Parallelspace\)
    Dec 28, 2003
  4. Bengt Richter
    Replies:
    6
    Views:
    535
    Juha Autero
    Aug 19, 2003
  5. hansiman

    Image in header column (not replacing column header text)

    hansiman, Feb 5, 2004, in forum: ASP .Net Datagrid Control
    Replies:
    3
    Views:
    337
    hansiman
    Feb 7, 2004
Loading...

Share This Page