Efficiently Parsing Data

Discussion in 'C++' started by Jasper, Dec 14, 2007.

  1. Jasper

    Jasper Guest

    Hi,

    I have multiple data files which need parsing in realtime so high
    performance is *crucial*.

    I dont have a format definition, but from what I can see there is a
    hierarchy of data.
    Each data field is named thus <"name":> (the <> are mine).
    The data can be quoted text or unquoted text or a composite hierarcy field.
    Each name/data pair is terminated by a comma unless it is the last in the
    group.

    A comma can also appear within a quoted text data field.

    The hierarchical tokens are open and close braces <{}> and open and
    close square brackets <[]>.

    Thats all there is to it :)

    The data describes, say, a school class, so we have a rigid set of data
    groups.
    eg we have data describing the teacher, data describing the class taken, and
    a repeating group describing each kid and grades.

    So it would be nice to be able to parse this data out into appropriate
    structures.

    Below is a snipped of dummy data (in reality there is much more). I have
    added the spacing and carriage returns for clarity. The real data has no
    white spaces. There may be a variable number of parameters (I think) so it
    would be useful to be able to ID and potentially store the variable name
    with its data value.

    Anyone got any ideas/code snips/references of the best, most speedy (at run
    time), way to go about it? A tight, pure c++ solution (with or without the
    stl) would be needed.

    Thanks in advance for any help


    {

    "teacher":{
    "name":
    "Mr Borat",
    "age":
    "35",
    "Nationality":
    "Kazakhstan"},


    "Class":{
    "Semester":
    "Summer",
    "Room":
    null,
    "Subject":
    "Politics",
    "Notes":
    "We're happy, you happy?"},

    "Students":
    [
    {
    "Smith":
    [{"First Name":"Mary","sex":"Female}],
    "Brown":
    [{"First Name":"John","sex":"Male}],
    "Jackson":
    [{"First Name":"Jackie","sex":"Female}]
    }
    ],


    "Grades":
    [
    {
    "Test":
    [{"grade":A,"points":68},{"grade":B,"points":25},{"grade":C,"points":15}],
    "Test":
    [{"grade":C,"points":2},{"grade":B,"points":29},{"grade":A,"points":55}],
    "Test":
    [{"grade":C,"points":2},{"grade":A,"points":72},{"grade":A,"points":65}]
    }
    ]

    }
     
    Jasper, Dec 14, 2007
    #1
    1. Advertising

  2. Jasper wrote:
    > I have multiple data files which need parsing in realtime so high
    > performance is *crucial*.
    >
    > I dont have a format definition, but from what I can see there is a
    > hierarchy of data.


    You better come up with a definition, otherwise you're programming
    without a spec. Even if you are reverse-engineering, you need to
    begin by writing a specification. A good spec gets you half way
    to the solution.

    Once you have the definition, you can write a flex/yacc grammar for
    it, and then you generate the code that handles that file. Simple
    as that.

    > [..]
    > Anyone got any ideas/code snips/references of the best, most speedy
    > (at run time), way to go about it? A tight, pure c++ solution (with
    > or without the stl) would be needed.


    I can only say, good luck with your homework!

    V
    --
    Please remove capital 'A's when replying by e-mail
    I do not respond to top-posted replies, please don't ask
     
    Victor Bazarov, Dec 14, 2007
    #2
    1. Advertising

  3. Jasper

    Jasper Guest


    > You better come up with a definition, otherwise you're programming
    > without a spec. Even if you are reverse-engineering, you need to
    > begin by writing a specification. A good spec gets you half way
    > to the solution.


    The only thing that defines the data as far as I can see is it's
    hierarchical structure as given by the bracket and square bracket.
    The token names and data values are irrelevant to this.

    > Once you have the definition, you can write a flex/yacc grammar for
    > it, and then you generate the code that handles that file. Simple
    > as that.


    I'm not writing a compiler. Maybe flex/yacc can help, I'm not familar with
    it, but it seems a bit of overkill for what I want.
    I wondered if I could adapt a lightweight XML parser, but all I need is a
    sort of DOM based on the "{}" and "[]"


    >> [..]
    >> Anyone got any ideas/code snips/references of the best, most speedy
    >> (at run time), way to go about it? A tight, pure c++ solution (with
    >> or without the stl) would be needed.

    >
    > I can only say, good luck with your homework!
    >


    Homework? What's that supposed to mean?
     
    Jasper, Dec 14, 2007
    #3
  4. Jasper

    anon Guest

    Jasper wrote:
    >> Once you have the definition, you can write a flex/yacc grammar for
    >> it, and then you generate the code that handles that file. Simple
    >> as that.

    >
    > I'm not writing a compiler. Maybe flex/yacc can help, I'm not familar with
    > it, but it seems a bit of overkill for what I want.
    > I wondered if I could adapt a lightweight XML parser, but all I need is a
    > sort of DOM based on the "{}" and "[]"


    Maybe this can help you:
    http://iridia.ulb.ac.be/~fvandenb/tools/xmlParser.html

    >
    >
    >>> [..]
    >>> Anyone got any ideas/code snips/references of the best, most speedy
    >>> (at run time), way to go about it? A tight, pure c++ solution (with
    >>> or without the stl) would be needed.

    >> I can only say, good luck with your homework!
    >>

    >
    > Homework? What's that supposed to mean?


    Here is definition:
    http://en.wikipedia.org/wiki/Homework
     
    anon, Dec 14, 2007
    #4
  5. Jasper

    Jasper Guest

    "anon" <> wrote in message
    news:fjtgb9$ndn$...
    > Jasper wrote:
    >>> Once you have the definition, you can write a flex/yacc grammar for
    >>> it, and then you generate the code that handles that file. Simple
    >>> as that.

    >>
    >> I'm not writing a compiler. Maybe flex/yacc can help, I'm not familar
    >> with it, but it seems a bit of overkill for what I want.
    >> I wondered if I could adapt a lightweight XML parser, but all I need is
    >> a sort of DOM based on the "{}" and "[]"

    >
    > Maybe this can help you:
    > http://iridia.ulb.ac.be/~fvandenb/tools/xmlParser.html
    >


    Thanks, I'll take a look. If you kow about the tool and XML (I assume) - I
    dont.
    Will I have to rewrite the parser to handle the brackets or are they part of
    the XML spec (in some way)?
    (I'm just asking for a "quickstart").


    >>
    >> Homework? What's that supposed to mean?

    >
    > Here is definition:
    > http://en.wikipedia.org/wiki/Homework



    Oh thats what it is. Thanls
     
    Jasper, Dec 14, 2007
    #5
  6. Jasper

    anon Guest

    Jasper wrote:
    > "anon" <> wrote in message
    > news:fjtgb9$ndn$...
    >> Jasper wrote:
    >>>> Once you have the definition, you can write a flex/yacc grammar for
    >>>> it, and then you generate the code that handles that file. Simple
    >>>> as that.
    >>> I'm not writing a compiler. Maybe flex/yacc can help, I'm not familar
    >>> with it, but it seems a bit of overkill for what I want.
    >>> I wondered if I could adapt a lightweight XML parser, but all I need is
    >>> a sort of DOM based on the "{}" and "[]"

    >> Maybe this can help you:
    >> http://iridia.ulb.ac.be/~fvandenb/tools/xmlParser.html
    >>

    >
    > Thanks, I'll take a look. If you kow about the tool and XML (I assume) - I
    > dont.


    There is a tutorial, explaining xml format and the library.

    > Will I have to rewrite the parser to handle the brackets or are they part of
    > the XML spec (in some way)?
    > (I'm just asking for a "quickstart").


    I don't know about brackets. Maybe
     
    anon, Dec 14, 2007
    #6
  7. Frank Bergemann, Dec 14, 2007
    #7
  8. Jasper wrote:

    > I wondered if I could adapt a lightweight XML parser, but all I need is a


    Wrong turn. I know, today it's XML for any conceivable i/o situation all
    hyped up so that people hardly know that software can actually exist
    that doesn't use XML, but still. You said you want something fast (and
    probably simple).

    > sort of DOM based on the "{}" and "[]"


    You can easily write a simple recursive-descent parser for that. A
    recursive descent parser is just another name for the intuitive solution.
     
    Matthias Buelow, Dec 14, 2007
    #8
  9. Jasper

    Jasper Guest

    "Matthias Buelow" <> wrote in message
    news:...
    > Jasper wrote:
    >
    >> I wondered if I could adapt a lightweight XML parser, but all I need is
    >> a

    >
    > Wrong turn. I know, today it's XML for any conceivable i/o situation all
    > hyped up so that people hardly know that software can actually exist
    > that doesn't use XML, but still. You said you want something fast (and
    > probably simple).


    Actually, I have just discovered that the format of the data is JSON.
     
    Jasper, Dec 14, 2007
    #9
  10. Jasper

    Guest

    On Dec 14, 6:34 am, "Jasper" <> wrote:
    > "Matthias Buelow" <> wrote in message
    >
    > news:...
    >
    > > Jasper wrote:

    >
    > >> I wondered if I could adapt a lightweight XML parser, but all I need is
    > >> a

    >
    > > Wrong turn. I know, today it's XML for any conceivable i/o situation all
    > > hyped up so that people hardly know that software can actually exist
    > > that doesn't use XML, but still. You said you want something fast (and
    > > probably simple).

    >
    > Actually, I have just discovered that the format of the data is JSON.


    As someone already mentioned, figuring out the format gets you well on
    your
    way to a solution. Google JSON C++ and you will see that there are
    existing
    solutions written to parse that format. So now the next step is to
    determine
    whether that work is sufficient for your task.
     
    , Dec 15, 2007
    #10
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Jane Austine
    Replies:
    14
    Views:
    806
    Dennis Lee Bieber
    Oct 9, 2004
  2. Jane Austine
    Replies:
    2
    Views:
    467
    Changjune Kim
    Oct 5, 2004
  3. taa
    Replies:
    0
    Views:
    245
  4. per
    Replies:
    2
    Views:
    630
  5. Replies:
    14
    Views:
    245
    J. Gleixner
    Jan 19, 2006
Loading...

Share This Page