Newbie q: Parsing vendor-data into uniform XML

Discussion in 'XML' started by Casper B, Jan 9, 2005.

  1. Casper B

    Casper B Guest

    If I have 3-4 specific ASCII/non-XML vendor-specific data-sheets,
    forming tables of simple types (int, float, string) with space as
    delimiter. The data is simple (from a grammar point of view) yet not as
    simple as a 2D-array/recordset. Example:

    1234567894 00000100 50 10400
    01330002 003 0000213337 10400
    01330025 002 0000066887 10400
    01330027 000 0000033841 10400
    01330029 001 0000061182 10400
    01330030 004 0000047411 10400
    9999999998 0001165422- 10400
    1234567894 00000100 50 10400
    01330003 001 0000033671- 10400
    01330004 001 0000116653- 10400
    ....looped data!

    Normally I would parse this and do transformation using a
    Compiler-Compiler. This is however, a very static approach (new format
    would require recompilation etc) and certainly not suited for database
    integration.

    Can I somehow use XML or any features hereof (DTD, Xpath...) to
    parse/validate vendor-specific ASCII/non-XML data-sheets and transform
    this into a standard XML format.

    The goal is of course, to be able to receive vendor-data in a new
    propriatary ASCII format and still be able to read the data provided an
    associated grammar has been created for this new format. Unfortunately I
    have no way of requireing the vendor to provide/follow a schema/XML
    format. :(

    Thanks in advance for any feedback!

    Casper Bang
     
    Casper B, Jan 9, 2005
    #1
    1. Advertising

  2. Casper B

    Andy Fish Guest

    "Casper B" <> wrote in message
    news:41e115f8$0$198$...
    > If I have 3-4 specific ASCII/non-XML vendor-specific data-sheets, forming
    > tables of simple types (int, float, string) with space as delimiter. The
    > data is simple (from a grammar point of view) yet not as simple as a
    > 2D-array/recordset. Example:
    >
    > 1234567894 00000100 50 10400
    > 01330002 003 0000213337 10400
    > 01330025 002 0000066887 10400
    > 01330027 000 0000033841 10400
    > 01330029 001 0000061182 10400
    > 01330030 004 0000047411 10400
    > 9999999998 0001165422- 10400
    > 1234567894 00000100 50 10400
    > 01330003 001 0000033671- 10400
    > 01330004 001 0000116653- 10400
    > ...looped data!
    >
    > Normally I would parse this and do transformation using a
    > Compiler-Compiler. This is however, a very static approach (new format
    > would require recompilation etc) and certainly not suited for database
    > integration.
    >
    > Can I somehow use XML or any features hereof (DTD, Xpath...) to
    > parse/validate vendor-specific ASCII/non-XML data-sheets and transform
    > this into a standard XML format.
    >


    you might be able to process these files using XML tools but it certainly
    wouldn't help with the job of parsing them.

    In XML, any data between tags is represented as text nodes so all you would
    end up with would be either a single text node or a sequence of text nodes.
    you would still have to use substring() or instr() type operations to locate
    the individual fields. this would be more complicated in, say, xxlt code
    than it would be in a conventional 3gl.

    I think you need to treat the parsing of the incoming non-xml data as a
    separate process. once you have done that, you can certainly build XML
    structures and use XML tools to process and output the data.


    > The goal is of course, to be able to receive vendor-data in a new
    > propriatary ASCII format and still be able to read the data provided an
    > associated grammar has been created for this new format. Unfortunately I
    > have no way of requireing the vendor to provide/follow a schema/XML
    > format. :(
    >
    > Thanks in advance for any feedback!
    >
    > Casper Bang
    >
     
    Andy Fish, Jan 10, 2005
    #2
    1. Advertising

  3. Casper B

    Casper B Guest

    Thought so, thanks for the clarification! :)

    Casper
     
    Casper B, Jan 10, 2005
    #3
  4. Casper B

    eranb Guest

    Hi,
    to handle the parsing side I would recomend taking a look at
    ContentMaster, ItemField's file parsing
    solution - using its parser studio a parsing solution for the scenario
    you have just described can be created in minutes.

    http://www.itemfield.com

    ContentMaster is a complete multi-format (EDI, Excel, Word, RTF, custom

    formats, etc.) text parsing solution, that comes with a dedicated
    visual
    authoring environment for the creation of parsing scripts, and a
    parsing
    engine that seamlessly integrates into any environement.
    Regards,

    Eran Berkowitz
     
    eranb, Jan 13, 2005
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    1
    Views:
    305
    anonymous
    Jan 17, 2005
  2. Harry Zoroc
    Replies:
    1
    Views:
    989
    Gregory Vaughan
    Jul 12, 2004
  3. tobiah
    Replies:
    3
    Views:
    274
    tobiah
    Sep 14, 2006
  4. melledge
    Replies:
    0
    Views:
    390
    melledge
    Nov 2, 2006
  5. melledge
    Replies:
    0
    Views:
    362
    melledge
    Nov 2, 2006
Loading...

Share This Page