minimal xml parser?

Discussion in 'XML' started by Mike, Oct 25, 2004.

  1. Mike

    Mike Guest

    Does anyone know of a minimal/mini/tiny/small xml parser
    in c? I'm looking for something small that accepts a stream
    or string, builds a c structure, and then returns an opaque
    pointer to that structure. There should then be a function
    to search that structure given the pointer, tag, and an
    optional attribute. I'm looking initially at only text data,
    no numbers, though eventuall there will be some binary
    data (CDATA?).

    Thanks.

    Mike
     
    Mike, Oct 25, 2004
    #1
    1. Advertising

  2. "Mike" <> wrote in message
    news:...
    > I'm looking initially at only text data,
    > no numbers, though eventuall there will be some binary
    > data (CDATA?).


    XML does not support "binary data" as the term is commonly used. All data
    within an XML instance must be valid per the specified character encoding.

    You should read the relevant sections of the XML specification before
    determining if XML is an appropriate representation for your requirements.

    /kmc
     
    Keith M. Corbett, Oct 25, 2004
    #2
    1. Advertising

  3. Mike

    Mike Guest

    In article <>, Keith M. Corbett wrote:
    > "Mike" <> wrote in message
    > news:...
    >> I'm looking initially at only text data,
    >> no numbers, though eventuall there will be some binary
    >> data (CDATA?).

    >
    > XML does not support "binary data" as the term is commonly used. All data
    > within an XML instance must be valid per the specified character encoding.
    >
    > You should read the relevant sections of the XML specification before
    > determining if XML is an appropriate representation for your requirements.
    >
    > /kmc
    >
    >


    XML has been chosen, I need to write the parser. Oh, and I do not have
    to validate the XML, just parse it.

    Mike
     
    Mike, Oct 26, 2004
    #3
  4. Mike

    William Park Guest

    Mike <> wrote:
    > XML has been chosen, I need to write the parser. Oh, and I do not have
    > to validate the XML, just parse it.


    Expat (www.libexpat.org). Practically every language has some sort of
    support for it, even Bash shell.

    --
    William Park <>
    Open Geometry Consulting, Toronto, Canada
     
    William Park, Oct 26, 2004
    #4
  5. Mike

    Mike Guest

    In article <>, William Park wrote:
    > Mike <> wrote:
    >> XML has been chosen, I need to write the parser. Oh, and I do not have
    >> to validate the XML, just parse it.

    >
    > Expat (www.libexpat.org). Practically every language has some sort of
    > support for it, even Bash shell.
    >


    Thanks for the expat suggestion. I have also read for libxml. I'd like to
    find a few hundred lines of c code to do this.

    Mike
     
    Mike, Oct 26, 2004
    #5
  6. In article <>,
    Mike <> wrote:
    % In article <>, William Park wrote:
    % > Mike <> wrote:
    % >> XML has been chosen, I need to write the parser. Oh, and I do not have
    % >> to validate the XML, just parse it.
    % >
    % > Expat (www.libexpat.org). Practically every language has some sort of
    % > support for it, even Bash shell.
    % >
    %
    % Thanks for the expat suggestion. I have also read for libxml. I'd like to
    % find a few hundred lines of c code to do this.

    I expect that you won't find a conforming XML parser which is a only few
    hundred lines long. The smallest conforming parsers I know of are expat
    and rxp, and they're in the thousands of lines. There's also tinyxml, which
    is not a conforming parser, and which is still in the thousands of lines.

    Although the tempation to write a minimal ``parser'' yourself may be
    overwhelming, I think you're better off using an existing, conforming,
    parser. Otherwise, you will almost certainly end up with a system that
    rejects valid XML files, and what's the good of that?


    I think you're looking for something like rxp's API.
    --

    Patrick TJ McPhee
    East York Canada
     
    Patrick TJ McPhee, Oct 26, 2004
    #6
  7. Mike wrote:
    > Does anyone know of a minimal/mini/tiny/small xml parser
    > in c? I'm looking for something small that accepts a stream
    > or string, builds a c structure, and then returns an opaque
    > pointer to that structure. There should then be a function
    > to search that structure given the pointer, tag, and an
    > optional attribute. I'm looking initially at only text data,
    > no numbers, though eventuall there will be some binary
    > data (CDATA?).


    You could try Mini-XML. See

    http://www.easysw.com/~mike/mxml/

    --
    To reply by e-mail, please remove the extra dot
    in the given address: m.collado -> mcollado
     
    Manuel Collado, Oct 26, 2004
    #7
  8. Mike

    Jeff Kish Guest

    On Mon, 25 Oct 2004 23:50:10 -0000, Mike <> wrote:

    >In article <>, William Park wrote:
    >> Mike <> wrote:
    >>> XML has been chosen, I need to write the parser. Oh, and I do not have
    >>> to validate the XML, just parse it.

    >>
    >> Expat (www.libexpat.org). Practically every language has some sort of
    >> support for it, even Bash shell.
    >>

    >
    >Thanks for the expat suggestion. I have also read for libxml. I'd like to
    >find a few hundred lines of c code to do this.
    >
    >Mike

    you really need the source code? I'm sure you could find a parser in library form ready for you to
    use.
     
    Jeff Kish, Oct 26, 2004
    #8
  9. Mike wrote:
    > Does anyone know of a minimal/mini/tiny/small xml parser
    > in c? I'm looking for something small that accepts a stream
    > or string, builds a c structure, and then returns an opaque
    > pointer to that structure. There should then be a function
    > to search that structure given the pointer, tag, and an
    > optional attribute. I'm looking initially at only text data,
    > no numbers, though eventuall there will be some binary
    > data (CDATA?).


    My Mini-XML library might be what you are looking for:

    http://www.easysw.com/~mike/mxml/

    --
    ______________________________________________________________________
    Michael Sweet, Easy Software Products mike at easysw dot com
    Printing Software for UNIX http://www.easysw.com
     
    Michael Sweet, Oct 26, 2004
    #9
  10. Patrick TJ McPhee wrote:
    > ...
    > I expect that you won't find a conforming XML parser which is a only
    > few hundred lines long. The smallest conforming parsers I know of are
    > expat and rxp, and they're in the thousands of lines. There's also
    > tinyxml, which is not a conforming parser, and which is still in the
    > thousands of lines.
    > ...


    It is a myth that conforming XML parsers have to be big; *validating*
    parsers, perhaps, but not a simple non-validating parser which
    accepts XML syntax and encoding.

    Mini-XML started as 696 lines of C code (it has since grown to a
    little over 2700 lines of code) and is a fully conformant XML
    parser that provides everything except validation (and I'm thinking
    how I could do that without bloating it...)

    --
    ______________________________________________________________________
    Michael Sweet, Easy Software Products mike at easysw dot com
    Printing Software for UNIX http://www.easysw.com
     
    Michael Sweet, Oct 26, 2004
    #10
  11. Mike

    William Park Guest

    Mike <> wrote:
    > In article <>, William Park wrote:
    > > Mike <> wrote:
    > >> XML has been chosen, I need to write the parser. Oh, and I do not have
    > >> to validate the XML, just parse it.

    > >
    > > Expat (www.libexpat.org). Practically every language has some sort of
    > > support for it, even Bash shell.
    > >

    >
    > Thanks for the expat suggestion. I have also read for libxml. I'd like to
    > find a few hundred lines of c code to do this.


    Are you talking about actually doing the parsing (duplicating what Expat
    does), or just calling API functions?

    If former, then I doubt there is one. If latter, then Gawk, Python,
    Bash, all have a binding to Expat.

    --
    William Park <>
    Open Geometry Consulting, Toronto, Canada
     
    William Park, Oct 27, 2004
    #11
  12. In article <>,
    Michael Sweet <> wrote:

    % Patrick TJ McPhee wrote:
    % > ...
    % > I expect that you won't find a conforming XML parser which is a only
    % > few hundred lines long.

    [...]

    % It is a myth that conforming XML parsers have to be big; *validating*
    % parsers, perhaps, but not a simple non-validating parser which
    % accepts XML syntax and encoding.
    %
    % Mini-XML started as 696 lines of C code (it has since grown to a

    Which is to say, more than a few hundred lines, and it seems like
    it wasn't conforming at that.

    % little over 2700 lines of code) and is a fully conformant XML

    Which is to say, thousands of lines.
    --

    Patrick TJ McPhee
    East York Canada
     
    Patrick TJ McPhee, Oct 27, 2004
    #12
  13. Mike

    Mike Guest

    In article <>, William Park wrote:
    > Mike <> wrote:
    >> In article <>, William Park wrote:
    >> > Mike <> wrote:
    >> >> XML has been chosen, I need to write the parser. Oh, and I do not have
    >> >> to validate the XML, just parse it.
    >> >
    >> > Expat (www.libexpat.org). Practically every language has some sort of
    >> > support for it, even Bash shell.
    >> >

    >>
    >> Thanks for the expat suggestion. I have also read for libxml. I'd like to
    >> find a few hundred lines of c code to do this.

    >
    > Are you talking about actually doing the parsing (duplicating what Expat
    > does), or just calling API functions?
    >
    > If former, then I doubt there is one. If latter, then Gawk, Python,
    > Bash, all have a binding to Expat.
    >


    I'm talking about the actual parsing.
     
    Mike, Oct 27, 2004
    #13
  14. Patrick TJ McPhee wrote:
    > In article <>,
    > Michael Sweet <> wrote:
    >
    > % Patrick TJ McPhee wrote:
    > % > ...
    > % > I expect that you won't find a conforming XML parser which is a only
    > % > few hundred lines long.
    >
    > [...]
    >
    > % It is a myth that conforming XML parsers have to be big; *validating*
    > % parsers, perhaps, but not a simple non-validating parser which
    > % accepts XML syntax and encoding.
    > %
    > % Mini-XML started as 696 lines of C code (it has since grown to a
    >
    > Which is to say, more than a few hundred lines, and it seems like
    > it wasn't conforming at that.


    Actually, it was, however features were added to make it perform
    better and support more use cases.

    > % little over 2700 lines of code) and is a fully conformant XML
    >
    > Which is to say, thousands of lines.


    But still a tiny fraction of the size of other XML parsers out
    there...

    --
    ______________________________________________________________________
    Michael Sweet, Easy Software Products mike at easysw dot com
    Printing Software for UNIX http://www.easysw.com
     
    Michael Sweet, Oct 27, 2004
    #14
  15. In article <>,
    Michael Sweet <> wrote:

    [I wrote]

    % > Which is to say, thousands of lines.
    %
    % But still a tiny fraction of the size of other XML parsers out
    % there...

    Except for the ones I cited in the post you seemed to contradict,
    which are roughly the same size.

    --

    Patrick TJ McPhee
    East York Canada
     
    Patrick TJ McPhee, Oct 28, 2004
    #15
  16. Mike

    cr88192 Guest

    "Mike" <> wrote in message
    news:...
    > Does anyone know of a minimal/mini/tiny/small xml parser
    > in c? I'm looking for something small that accepts a stream
    > or string, builds a c structure, and then returns an opaque
    > pointer to that structure. There should then be a function
    > to search that structure given the pointer, tag, and an
    > optional attribute. I'm looking initially at only text data,
    > no numbers, though eventuall there will be some binary
    > data (CDATA?).
    >

    oh well, this thread is new enough that I think I will add my comment.

    if motivated, maybe my parser could be made to work in your case.
    kalloc/kfree are for my allocator.
    kralloc is a rotating allocator (allocates from a large circular buffer),
    and thus does not need freeing.

    ObjType_New can be replaced by kalloc (or malloc if needed).

    be warned if replacing kalloc or such with malloc in that it will be
    necessary to zero memory returned by malloc (not necissarily done by malloc
    by default).

    I ommited, eg, the printer here though...

    part of the header:
    ----
    #define TOKEN_NULL 0
    #define TOKEN_SPECIAL 1
    #define TOKEN_STRING 2
    #define TOKEN_SYMBOL 3

    typedef struct NetParse_Attr_s NetParse_Attr;
    typedef struct NetParse_Node_s NetParse_Node;

    struct NetParse_Attr_s {
    NetParse_Attr *next;
    char *ns;
    char *key;
    char *value;
    };

    struct NetParse_Node_s {
    NetParse_Node *next;
    char *ns;
    char *key;
    char *text;
    NetParse_Attr *attr;
    NetParse_Node *first;
    };


    dump of part of my parser:
    ----
    /*--
    Cat pdlib;Parse;XML
    Form
    char *NetParse_XML_EatWhite(char *s);
    Description
    Skips over whitespace.
    Status Internal
    --*/
    char *NetParse_XML_EatWhite(char *s)
    {
    int i, r;

    i=0;
    while(*s && *s<=' ')
    {
    if(*s=='\n')
    {
    line++;
    *s=' ';
    }
    i=1;
    s++;
    }

    if(i)s=NetParse_XML_EatWhite(s);

    return(s);
    }

    /*--
    Cat pdlib;Parse;XML
    Form
    int NetParse_XML_SpecialP(char *s);
    Description
    Returns a nonzero value if *s is special.
    Status Internal
    --*/
    int NetParse_XML_SpecialP(char *s)
    {
    switch(*s)
    {
    case '<':
    return(1);
    break;
    case '>':
    return(1);
    break;
    case '/':
    return(1);
    break;
    case '=':
    return(1);
    break;
    case '?':
    return(1);
    break;
    case ':':
    return(1);
    break;
    default:
    return(0);
    break;
    }
    return(0);
    }

    /*--
    Cat pdlib;Parse;XML
    Form
    int NetParse_XML_ContSpecialP(char *s);
    Description
    Returns nonzero if this will get the parsers attention when reading as
    text.
    This includes '<' and '&'.
    Status Internal
    --*/
    int NetParse_XML_ContSpecialP(char *s)
    {
    switch(*s)
    {
    case '<':
    return(1);
    break;
    case '&':
    return(1);
    break;
    default:
    return(0);
    break;
    }
    return(0);
    }

    /*--
    Cat pdlib;Parse;XML
    Form
    char *NetParse_XML_Token(char *s, char *b, int *t);
    Description
    Reads a token from the XML stream.
    This includes:
    Individual symbols;
    Globs of text/tags;
    Strings.
    b is the buffer.
    t is an integer to hold the token type
    TOKEN_NULL, a null terminator was reached;
    TOKEN_SPECIAL, a special character.
    TOKEN_STRING, a quoted string literal (escapes processed).
    TOKEN_SYMBOL, an unquoted bit of text (eg: a tag).
    Returns the next character after the token.
    Status Internal
    --*/
    char *NetParse_XML_Token(char *s, char *b, int *t)
    {
    char *ob, *is, *t2;
    char *buf;
    int i;

    is=s;
    if(!b)b=kralloc(256);
    ob=b;
    *b=0;

    if(t)*t=0;

    buf=kralloc(16);

    s=NetParse_XML_EatWhite(s);
    if(!*s)
    {
    *t=TOKEN_NULL;
    return(s);
    }

    if(NetParse_XML_SpecialP(s))
    {
    if(t)*t=TOKEN_SPECIAL;

    *b++=*s++;
    *b=0;
    }else if((*s=='"') || (*s=='\'')) /* quoted string */
    {
    if(t)*t=TOKEN_STRING;
    s++;
    while(*s && (*s!='"') && (*s!='\''))
    {
    if(*s=='&')
    {
    s++;
    t2=buf;
    while(*s && (*s!=';'))*t2++=*s++;
    if(!*s)return(NULL);
    *t2++=0;
    s++;

    if(buf[0]=='#')
    {
    if(buf[1]=='x')
    {
    t=buf+2;
    i=0;
    while(*t)
    {
    i<<=4;
    if((*t>='0') && (*t<='9'))
    i+=*t-'0';
    if((*t>='A') && (*t<='F'))
    i+=*t-'A'+10;
    if((*t>='a') && (*t<='f'))
    i+=*t-'a'+10;
    t++;
    }
    *b++=i;
    }else *b++=atoi(buf+1);
    }
    if(!strcmp(buf, "amp"))*b++='&';
    if(!strcmp(buf, "lt"))*b++='<';
    if(!strcmp(buf, "gt"))*b++='>';
    if(!strcmp(buf, "quot"))*b++='"';
    if(!strcmp(buf, "apos"))*b++='\'';
    }else *b++=*s++;
    }
    if(!*s)
    {
    *t=TOKEN_NULL;
    return(is);
    }
    *b++=0;
    s++;
    }else
    {
    if(t)*t=TOKEN_SYMBOL;

    while(*s && (*s>' ') && !NetParse_XML_SpecialP(s) &&
    ((b-ob)<254))
    *b++=*s++;
    *b++=0;

    if(!*s)
    {
    *t=TOKEN_NULL;
    return(is);
    }
    }
    return(s);
    }

    /*--
    Cat pdlib;Parse;XML
    Form
    char *NetParse_XML_ParseText(char *s, char *b);
    Description
    Parse a glob of text from the stream.
    Handles escapes and such.
    Status Internal
    --*/
    char *NetParse_XML_ParseText(char *s, char *b)
    {
    char *ob, *t;
    char buf[16];
    int i, gws, rws;

    if(!b)b=kralloc(4096);
    ob=b;
    *b=0;

    s=NetParse_XML_EatWhite(s);
    if(!*s)return(NULL);

    gws=0;
    rws=0;
    while(1)
    {
    while(*s && !NetParse_XML_ContSpecialP(s))
    {
    if((*s=='\r') || (*s=='\n'))
    {
    s=NetParse_XML_EatWhite(s);
    if(!rws)
    {
    *b++=' ';
    gws++;
    }
    continue;
    }
    gws=0;
    if(*s<=' ')rws++;
    else rws=0;
    *b++=*s++;
    }
    if(!*s)return(NULL);

    if(*s=='&')
    {
    s++;
    t=buf;
    while(*s && (*s!=';'))*t++=*s++;
    if(!*s)return(NULL);
    *t++=0;
    s++;

    if(buf[0]=='#')
    {
    if(buf[1]=='x')
    {
    t=buf+2;
    i=0;
    while(*t)
    {
    i<<=4;
    if((*t>='0') && (*t<='9'))
    i+=*t-'0';
    if((*t>='A') && (*t<='F'))
    i+=*t-'A'+10;
    if((*t>='a') && (*t<='f'))
    i+=*t-'a'+10;
    t++;
    }
    gws=0;
    if(i<=' ')rws++;
    else rws=0;
    *b++=i;
    }else
    {
    i=atoi(buf+1);
    gws=0;
    if(i<=' ')rws++;
    else rws=0;
    *b++=i;
    }
    continue;
    }
    rws=0;
    gws=0;

    if(!strcmp(buf, "amp"))*b++='&';
    if(!strcmp(buf, "lt"))*b++='<';
    if(!strcmp(buf, "gt"))*b++='>';
    if(!strcmp(buf, "apos"))*b++='\'';
    if(!strcmp(buf, "quot"))*b++='"';
    }else break;
    }
    b-=gws;
    *b++=0;

    return(s);
    }

    /*--
    Cat pdlib;Parse;XML
    Form
    NetParse_Attr *NetParse_XML_ParseOpts(char **s);
    Description
    Parse the list of attributes within a tag.
    Status Internal
    --*/
    NetParse_Attr *NetParse_XML_ParseOpts(char **s)
    {
    // char ns[32];
    // char var[32];
    // char eq[16];
    // char val[256];
    char *is, *ns, *var, *eq, *val;
    int ty;
    NetParse_Attr *lst, *end, *tmp;

    ns=kralloc(256);
    var=kralloc(256);
    eq=kralloc(256);
    val=kralloc(4096);

    lst=NULL;
    end=NULL;

    is=*s;
    while(1)
    {
    NetParse_XML_Token(*s, var, &ty);
    if(ty==TOKEN_NULL)
    {
    kprint("m1\n");
    *s=NULL;
    return(NULL);
    }

    if((var[0]=='>') && (ty==TOKEN_SPECIAL))
    break;
    if((var[0]=='/') && (ty==TOKEN_SPECIAL))
    break;
    if((var[0]=='?') && (ty==TOKEN_SPECIAL))
    break;
    if(ty==TOKEN_NULL)
    {
    kprint("m2\n");
    *s=NULL;
    return(NULL);
    }
    if(ty!=TOKEN_SYMBOL)
    {
    kprint("parse error (inv attribute).\n");
    return(NULL);
    }

    *s=NetParse_XML_Token(*s, var, &ty);
    if(ty==TOKEN_NULL)
    {
    kprint("m3\n");
    *s=NULL;
    return(NULL);
    }
    *s=NetParse_XML_Token(*s, eq, &ty);
    if(ty==TOKEN_NULL)
    {
    kprint("m4\n");
    *s=NULL;
    return(NULL);
    }

    if((ty==TOKEN_SPECIAL) && (eq[0]==':'))
    {
    strcpy(ns, var);

    *s=NetParse_XML_Token(*s, var, &ty);
    if(ty==TOKEN_NULL)
    {
    kprint("m41\n");
    *s=NULL;
    return(NULL);
    }
    *s=NetParse_XML_Token(*s, eq, &ty);
    if(ty==TOKEN_NULL)
    {
    kprint("m42\n");
    *s=NULL;
    return(NULL);
    }
    }else ns[0]=0;

    if((ty!=TOKEN_SPECIAL) || (eq[0]!='='))
    {
    kprint("parse error (attr equal).\n");
    return(NULL);
    }

    *s=NetParse_XML_Token(*s, val, &ty);
    if(ty==TOKEN_NULL)
    {
    kprint("m5\n");
    *s=NULL;
    return(NULL);
    }

    if(ty!=TOKEN_STRING)
    {
    kprint("parse error (inv attribute arg).\n");
    return(NULL);
    }

    // t=CONS(SYM(var), CONS(STRING(val), MISC_EOL));
    // x=CONS(t, x);
    // tmp=kalloc(sizeof(NetParse_Attr));
    tmp=NetParse_NewAttr();
    tmp->next=NULL;
    if(ns[0])tmp->ns=kstrdup(ns);
    tmp->key=kstrdup(var);
    tmp->value=kstrdup(val);

    if(end)
    {
    end->next=tmp;
    end=tmp;
    }else
    {
    lst=tmp;
    end=tmp;
    }
    }

    return(lst);
    }

    /*--
    Cat pdlib;Parse;XML
    Form
    NetParse_Node *NetParse_XML_ParseExpr(char **s);
    Description
    Parses an XML expression.
    s is updated to reflect the change.

    NULL is returned on parse error or end-of-stream.
    s is not updated for end of stream conditions, which can be used to
    seperate it from a parse error.
    --*/
    NetParse_Node *NetParse_XML_ParseExpr(char **s)
    {
    // char buf[256], buf2[16];
    // char key[32], ns[32];
    char *buf, *buf2, *key, *ns;

    int ty, i;
    char *s2, *s3, *s4, *is;

    // elem kv, opts, t, x;
    NetParse_Node *tmp, *t, *end;

    is=*s;
    *s=NetParse_XML_EatWhite(*s);
    if(!*(*s))return(NULL);

    buf=kalloc(256);
    buf2=kalloc(256);
    key=kalloc(256);
    ns=kalloc(256);

    // strncpy(buf, *s, 5);
    // buf[5]=0;
    // kprint("parse: %s\n", buf);

    NetParse_XML_Token(*s, buf, &ty);
    if(ty==TOKEN_NULL)
    {
    kfree(buf);
    kfree(buf2);
    kfree(key);
    kfree(ns);
    *s=is;
    return(NULL);
    }
    if((buf[0]=='<') && (ty==TOKEN_SPECIAL))
    {
    *s=NetParse_XML_Token(*s, buf, &ty);
    if(ty==TOKEN_NULL)
    {
    kfree(buf);
    kfree(buf2);
    kfree(key);
    kfree(ns);
    *s=is;
    return(NULL);
    }
    if(*s[0]=='?')
    *s=*s+1;
    if(*s[0]=='!')
    {
    if(!strncmp(*s, "[CDATA[", 7))
    {
    *s=*s+7;
    s2=kalloc(65536);
    s3=s2;
    s4=*s;
    while(*s4)
    {
    if(!strncmp(s4, "]]>", 3))
    {
    s4+=3;
    break;
    }
    if(!strncmp(s4, "]]&gt;", 6))
    {
    s4+=6;

    *s3++=']';
    *s3++=']';
    *s3++='>';
    continue;
    }
    *s3++=*s4++;
    }
    if(!*s4)
    {
    kfree(s2);

    kfree(buf);
    kfree(buf2);
    kfree(key);
    kfree(ns);
    *s=is;
    return(NULL);
    }

    *s3++=0;
    *s=s4;

    tmp=NetParse_NewNode();
    tmp->next=NULL;
    tmp->key=NULL;
    tmp->text=kstrdup(s2);
    tmp->attr=NULL;
    tmp->first=NULL;

    kfree(s2);

    kfree(buf);
    kfree(buf2);
    kfree(key);
    kfree(ns);
    return(tmp);
    }

    s2=*s;
    i=1;
    while(*s2 && i)
    {
    if(*s2=='<')i++;
    if(*s2=='>')i--;
    if(*s2=='[')i++;
    if(*s2==']')i--;
    s2++;
    }

    kfree(buf);
    kfree(buf2);
    kfree(key);
    kfree(ns);
    *s=s2;
    return(NetParse_XML_ParseExpr(s));
    }

    *s=NetParse_XML_Token(*s, key, &ty);
    if(ty==TOKEN_NULL)
    {
    kfree(buf);
    kfree(buf2);
    kfree(key);
    kfree(ns);
    *s=is;
    return(NULL);
    }
    if(ty!=TOKEN_SYMBOL)
    {
    kprint("parse error (inv tag).\n");
    kfree(buf);
    kfree(buf2);
    kfree(key);
    kfree(ns);
    return(NULL);
    }

    if(**s==':')
    {
    *s=*s+1;
    strcpy(ns, key);
    *s=NetParse_XML_Token(*s, key, &ty);
    }else ns[0]=0;

    if((**s>' ') && (**s!='>') && (**s!='/'))
    {
    kprint("parse error (inv char after tag).\n");
    kfree(buf);
    kfree(buf2);
    kfree(key);
    kfree(ns);
    return(NULL);
    }
    // kv=SYM(key);
    // opts=NetParse_XML_ParseOpts(s);
    // if(opts==MISC_UNDEFINED)return(t);

    // tmp=kalloc(sizeof(NetParse_Node));
    tmp=NetParse_NewNode();
    tmp->next=NULL;
    if(ns[0])tmp->ns=kstrdup(ns);
    tmp->key=kstrdup(key);
    s3=*s;
    tmp->attr=NetParse_XML_ParseOpts(s);
    if(!*s)
    {
    kprint("attr traunc\n");
    kfree(buf);
    kfree(buf2);
    kfree(key);
    kfree(ns);
    *s=is;
    return(NULL);
    }
    tmp->first=NULL;

    *s=NetParse_XML_Token(*s, buf, &ty);
    if(ty==TOKEN_NULL)
    {
    kfree(buf);
    kfree(buf2);
    kfree(key);
    kfree(ns);
    *s=is;
    return(NULL);
    }
    if((buf[0]=='/') && (ty==TOKEN_SPECIAL))
    {
    *s=NetParse_XML_Token(*s, buf, &ty);
    if(ty==TOKEN_NULL)
    {
    kfree(buf);
    kfree(buf2);
    kfree(key);
    kfree(ns);
    *s=is;
    return(NULL);
    }
    return(tmp);
    }
    if((buf[0]=='?') && (ty==TOKEN_SPECIAL))
    {
    *s=NetParse_XML_Token(*s, buf, &ty);
    if(ty==TOKEN_NULL)
    {
    kfree(buf);
    kfree(buf2);
    kfree(key);
    kfree(ns);
    *s=is;
    return(NULL);
    }
    // x=CONS(kv, CONS(opts, MISC_EOL));
    // x=CONS(SYM("?"), x);
    strcpy(buf, "?");
    strcat(buf, tmp->key);
    kfree(tmp->key);
    tmp->key=kstrdup(buf);

    kfree(buf);
    kfree(buf2);
    kfree(key);
    kfree(ns);
    return(tmp);
    }
    if(buf[0]!='>')
    {
    kprint("parse error (expected close '>').\n");
    kfree(buf);
    kfree(buf2);
    kfree(key);
    kfree(ns);
    return(NULL);
    }

    end=NULL;
    // x=MISC_EOL;
    while(1)
    {
    s2=*s;
    s2=NetParse_XML_Token(s2, buf, &ty);
    if(ty==TOKEN_NULL)
    {
    kfree(buf);
    kfree(buf2);
    kfree(key);
    kfree(ns);
    *s=is;
    return(NULL);
    }
    s2=NetParse_XML_Token(s2, buf2, &ty);
    if(ty==TOKEN_NULL)
    {
    kfree(buf);
    kfree(buf2);
    kfree(key);
    kfree(ns);
    *s=is;
    return(NULL);
    }

    if(buf[0]=='<' && buf2[0]=='/')
    {
    s2=NetParse_XML_Token(s2, buf, &ty);
    if(ty==TOKEN_NULL)
    {
    kfree(buf);
    kfree(buf2);
    kfree(key);
    kfree(ns);
    *s=is;
    return(NULL);
    }
    s2=NetParse_XML_Token(s2, buf, &ty);
    if(ty==TOKEN_NULL)
    {
    kfree(buf);
    kfree(buf2);
    kfree(key);
    kfree(ns);
    *s=is;
    return(NULL);
    }
    *s=s2;
    break;
    }
    s3=*s;
    t=NetParse_XML_ParseExpr(s);
    if(*s==s3)
    {
    kfree(buf);
    kfree(buf2);
    kfree(key);
    kfree(ns);
    *s=is;
    return(NULL);
    }

    if(!t)return(t);
    // x=CONS(t, x);
    if(end)
    {
    end->next=t;
    end=t;
    }else
    {
    tmp->first=t;
    end=t;
    }
    }
    kfree(buf);
    kfree(buf2);
    kfree(key);
    kfree(ns);
    return(tmp);
    }

    s2=kalloc(65536);
    *s=NetParse_XML_ParseText(*s, s2);
    if(!*s)
    {
    kfree(buf);
    kfree(buf2);
    kfree(key);
    kfree(ns);
    *s=is;
    return(NULL);
    }

    // tmp=kalloc(sizeof(NetParse_Node));
    tmp=NetParse_NewNode();
    tmp->next=NULL;
    tmp->key=NULL;
    tmp->text=kstrdup(s2);
    tmp->attr=NULL;
    tmp->first=NULL;

    kfree(s2);

    kfree(buf);
    kfree(buf2);
    kfree(key);
    kfree(ns);
    return(tmp);
    }

    /*--
    Cat pdlib;Parse;XML
    Form
    NetParse_Node *NetParse_XML_LoadFile(char *name);
    Description
    loads XML from a file.
    returns NULL on failure.
    --*/
    NetParse_Node *NetParse_XML_LoadFile(char *name)
    {
    VFILE *fd;
    char *buf, *s;
    NetParse_Node *n;

    fd=vffopen(name, "rb");
    if(!fd)return(NULL);

    buf=vf_bufferin(fd);
    if(!buf)return(NULL);

    s=buf;
    while(*s)
    {
    n=NetParse_XML_ParseExpr(&s);
    if(!n)break;
    if(n->key[0]=='?')continue;
    return(n);
    }
    return(NULL);
    }

    and part of the crap for dealing with parse trees:
    ----
    /*--
    Cat pdlib;Parse
    Form
    int NetParse_Init();
    Description
    Init function for NetParse, called implicitly by node/attr creation.
    --*/
    int NetParse_Init()
    {
    static int init=0;

    if(init)return(1);
    init=1;

    ObjType_NewType("netparse_attr_t", "*struct;string;string;");
    ObjType_NewType("netparse_node_t",
    "*struct;string;string;*struct;*struct;");
    return(0);
    }

    /*--
    Cat pdlib;Parse
    Form
    NetParse_Attr *NetParse_NewAttr();
    Description
    Creates a new attribute.
    --*/
    NetParse_Attr *NetParse_NewAttr()
    {
    NetParse_Attr *tmp;

    NetParse_Init();

    // tmp=kalloc(sizeof(NetParse_Attr));
    tmp=ObjType_New("netparse_attr_t", sizeof(NetParse_Attr));
    tmp->next=NULL;
    tmp->key=NULL;
    tmp->value=NULL;

    return(tmp);
    }

    /*--
    Cat pdlib;Parse
    Form
    NetParse_Attr *NetParse_AddAttr(NetParse_Node *node, char *key, char
    *value);
    Description
    Adds an attribute to a node (or sets the attribute if present).
    --*/
    NetParse_Attr *NetParse_AddAttr(NetParse_Node *node, char *key, char *value)
    {
    NetParse_Attr *tmp, *cur;

    cur=node->attr;
    while(cur)
    {
    if(!strcmp(cur->key, key))
    {
    if(cur->value)kfree(cur->value);
    cur->value=kstrdup(value);
    return(cur);
    }
    cur=cur->next;
    }

    // tmp=kalloc(sizeof(NetParse_Attr));
    tmp=NetParse_NewAttr();
    tmp->next=NULL;
    tmp->key=kstrdup(key);
    tmp->value=kstrdup(value);

    if(!node->attr)
    {
    node->attr=tmp;
    return(tmp);
    }
    cur=node->attr;
    while(cur->next)cur=cur->next;
    cur->next=tmp;

    return(tmp);
    }

    /*--
    Cat pdlib;Parse
    Form
    NetParse_Attr *NetParse_AddAttrList(NetParse_Attr *lst, char *key, char
    *value);
    Description
    Adds an attribute to a list of attributes (or assigns the attribure \
    if allready present).
    Returns the start of the list, or the new attribute if lst is NULL.
    --*/
    NetParse_Attr *NetParse_AddAttrList(NetParse_Attr *lst, char *key, char
    *value)
    {
    NetParse_Attr *tmp, *cur;

    cur=lst;
    while(cur)
    {
    if(!strcmp(cur->key, key))
    {
    if(cur->value)kfree(cur->value);
    cur->value=kstrdup(value);
    return(lst);
    }
    cur=cur->next;
    }

    // tmp=kalloc(sizeof(NetParse_Attr));
    tmp=NetParse_NewAttr();
    tmp->next=NULL;
    tmp->key=kstrdup(key);
    tmp->value=kstrdup(value);

    if(!lst)return(tmp);

    cur=lst;
    while(cur->next)cur=cur->next;
    cur->next=tmp;

    return(lst);
    }

    /*--
    Cat pdlib;Parse
    Form
    char *NetParse_GetNodeAttr(NetParse_Node *node, char *key);
    Description
    Gets an attribute associated with a node.
    Returns NULL if not found.
    --*/
    char *NetParse_GetNodeAttr(NetParse_Node *node, char *key)
    {
    NetParse_Attr *cur;

    cur=node->attr;
    while(cur)
    {
    if(!strcmp(cur->key, key))return(cur->value);
    cur=cur->next;
    }
    return(NULL);
    }

    /*--
    Cat pdlib;Parse
    Form
    int NetParse_GetNodeAttrIsP(NetParse_Node *node, char *key, char *value);
    Description
    Check if a given node has a certain attribute as a certain value.
    --*/
    int NetParse_GetNodeAttrIsP(NetParse_Node *node, char *key, char *value)
    {
    NetParse_Attr *cur;

    cur=node->attr;
    while(cur)
    {
    if(!strcmp(cur->key, key))
    {
    if(!strcmp(cur->value, value))
    return(1);
    else return(0);
    }
    cur=cur->next;
    }
    return(0);
    }

    /*--
    Cat pdlib;Parse
    Form
    char *NetParse_GetAttrList(NetParse_Attr *lst, char *key);
    Description
    Gets an attribute in a list.
    Returns NULL if not found.
    --*/
    char *NetParse_GetAttrList(NetParse_Attr *lst, char *key)
    {
    NetParse_Attr *cur;

    cur=lst;
    while(cur)
    {
    if(!strcmp(cur->key, key))return(cur->value);
    cur=cur->next;
    }
    return(NULL);
    }

    /*--
    Cat pdlib;Parse
    Form
    NetParse_Node *NetParse_NewNode();
    Description
    Creates a new node.
    --*/
    NetParse_Node *NetParse_NewNode()
    {
    NetParse_Node *tmp;

    NetParse_Init();

    // tmp=kalloc(sizeof(NetParse_Node));
    tmp=ObjType_New("netparse_node_t", sizeof(NetParse_Node));
    tmp->next=NULL;
    tmp->key=NULL;
    tmp->text=NULL;
    tmp->attr=NULL;
    tmp->first=NULL;

    return(tmp);
    }

    /*--
    Cat pdlib;Parse
    Form
    NetParse_Node *NetParse_AddNodeEnd(NetParse_Node *first, NetParse_Node
    *node);
    Description
    Adds a new node at the end of a list of nodes.
    --*/
    NetParse_Node *NetParse_AddNodeEnd(NetParse_Node *first, NetParse_Node
    *node)
    {
    NetParse_Node *cur;

    if(!first)return(node);

    cur=first;
    while(cur->next)cur=cur->next;
    cur->next=node;

    return(first);
    }

    /*--
    Cat pdlib;Parse
    Form
    int NetParse_AddChildNode(NetParse_Node *parent, NetParse_Node *node);
    Description
    Add a new child node to a parent.
    --*/
    int NetParse_AddChildNode(NetParse_Node *parent,
    NetParse_Node *node)
    {
    NetParse_Node *cur;

    if(!parent->first)
    {
    parent->first=node;
    return(0);
    }

    cur=parent->first;
    while(cur->next)cur=cur->next;
    cur->next=node;

    return(0);
    }

    /*--
    Cat pdlib;Parse
    Form
    int NetParse_FreeAttr(NetParse_Attr *attr);
    Description
    Frees an attribute.
    Also frees any following attributes.
    --*/
    int NetParse_FreeAttr(NetParse_Attr *attr)
    {
    if(attr->next)NetParse_FreeAttr(attr->next);
    if(attr->key)kfree(attr->key);
    if(attr->value)kfree(attr->value);
    kfree(attr);

    return(0);
    }

    /*--
    Cat pdlib;Parse
    Form
    int NetParse_FreeNode(NetParse_Node *node);
    Description
    Frees a node and any associated attributes.
    Also frees any child nodes.
    --*/
    int NetParse_FreeNode(NetParse_Node *node)
    {
    NetParse_Node *cur, *next;

    if(node->key)kfree(node->key);
    if(node->text)kfree(node->text);
    if(node->attr)NetParse_FreeAttr(node->attr);

    cur=node->first;
    while(cur)
    {
    next=cur->next;

    if(cur->key)kfree(cur->key);
    if(cur->text)kfree(cur->text);
    if(cur->attr)NetParse_FreeAttr(cur->attr);
    if(cur->first)NetParse_FreeNode(cur->first);
    kfree(cur);

    cur=next;
    }
    kfree(node);

    return(0);
    }

    /*--
    Cat pdlib;Parse
    Form
    NetParse_Attr *NetParse_CopyAttr(NetParse_Attr *attr);
    Description
    Copies an attribute along with any following attributes.
    --*/
    NetParse_Attr *NetParse_CopyAttr(NetParse_Attr *attr)
    {
    NetParse_Attr *tmp;

    tmp=NetParse_NewAttr();

    if(attr->next)
    tmp->next=NetParse_CopyAttr(attr->next);
    if(attr->key)
    tmp->key=kstrdup(attr->key);
    if(attr->value)
    tmp->value=kstrdup(attr->value);

    return(tmp);
    }

    /*--
    Cat pdlib;Parse
    Form
    NetParse_Node *NetParse_CopyNode(NetParse_Node *node);
    Description
    Makes a copy of a node tree, copies any attributes or children.
    --*/
    NetParse_Node *NetParse_CopyNode(NetParse_Node *node)
    {
    NetParse_Node *cur;
    NetParse_Node *tmp, *lst, *end, *t2;

    tmp=NetParse_NewNode();
    if(node->key)
    tmp->key=kstrdup(node->key);
    if(node->text)
    tmp->text=kstrdup(node->text);
    if(node->attr)
    tmp->attr=NetParse_CopyAttr(node->attr);

    lst=NULL;
    end=NULL;
    cur=node->first;
    while(cur)
    {
    t2=NetParse_CopyNode(cur);
    if(end)end->next=t2;
    end=t2;

    if(!lst)lst=end;
    cur=cur->next;
    }
    tmp->first=lst;

    return(tmp);
    }

    /*--
    Cat pdlib;Parse
    Form
    NetParse_Node *NetParse_FindKey(NetParse_Node *first, char *key);
    Description
    Finds a node in a list with a given key.
    Returns NULL if not found.
    --*/
    NetParse_Node *NetParse_FindKey(NetParse_Node *first, char *key)
    {
    NetParse_Node *cur;

    cur=first;
    while(cur)
    {
    if(cur->key)
    if(!strcmp(cur->key, key))
    return(cur);
    cur=cur->next;
    }
    return(NULL);
    }

    //abstract interface funcs

    /*--
    Cat pdlib;Parse
    Form
    NetParse_Node *NetParse_NewNodeKey(char *ns, char *key);
    Description
    Creates a new node with a given namespace prefix and key.
    ns may be NULL in most cases (the tag does not have a namespace \
    prefix).
    --*/
    NetParse_Node *NetParse_NewNodeKey(char *ns, char *key)
    {
    NetParse_Node *tmp;

    tmp=NetParse_NewNode();
    tmp->key=kstrdup(key);

    return(tmp);
    }

    /*--
    Cat pdlib;Parse
    Form
    NetParse_Node *NetParse_NewNodeText(char *text);
    Description
    Creates a new text node with the contents given.
    --*/
    NetParse_Node *NetParse_NewNodeText(char *text)
    {
    NetParse_Node *tmp;

    tmp=NetParse_NewNode();
    tmp->text=kstrdup(text);

    return(tmp);
    }

    /*--
    Cat pdlib;Parse
    Form
    char *NetParse_GetNodeNS(NetParse_Node *node);
    char *NetParse_GetNodeKey(NetParse_Node *node);
    char *NetParse_GetNodeText(NetParse_Node *node);
    NetParse_Node *NetParse_GetNodeFirst(NetParse_Node *node);
    NetParse_Node *NetParse_GetNodeNext(NetParse_Node *node);
    Description
    Get a property of a node, each will return NULL in the case that \
    the given property is not present.
    --*/
    char *NetParse_GetNodeNS(NetParse_Node *node)
    {
    return(node->ns);
    }

    char *NetParse_GetNodeKey(NetParse_Node *node)
    {
    return(node->key);
    }

    char *NetParse_GetNodeText(NetParse_Node *node)
    {
    return(node->text);
    }

    NetParse_Node *NetParse_GetNodeFirst(NetParse_Node *node)
    {
    return(node->first);
    }

    NetParse_Node *NetParse_GetNodeNext(NetParse_Node *node)
    {
    return(node->next);
    }

    /*--
    Cat pdlib;Parse
    Form
    int NetParse_SetNodeNS(NetParse_Node *node, char *value);
    int NetParse_SetNodeKey(NetParse_Node *node, char *value);
    int NetParse_SetNodeText(NetParse_Node *node, char *value);
    int NetParse_SetNodeFirst(NetParse_Node *node, NetParse_Node *node2);
    int NetParse_SetNodeNext(NetParse_Node *node, NetParse_Node *node2);
    Description
    Set a property of a node, the return value will be 0 if no errors \
    occure.
    --*/
    int NetParse_SetNodeNS(NetParse_Node *node, char *value)
    {
    node->ns=kstrdup(value);
    return(0);
    }

    int NetParse_SetNodeKey(NetParse_Node *node, char *value)
    {
    node->key=kstrdup(value);
    return(0);
    }

    int NetParse_SetNodeText(NetParse_Node *node, char *value)
    {
    node->text=kstrdup(value);
    return(0);
    }

    int NetParse_SetNodeFirst(NetParse_Node *node, NetParse_Node *node2)
    {
    node->first=node2;
    return(0);
    }

    int NetParse_SetNodeNext(NetParse_Node *node, NetParse_Node *node2)
    {
    node->next=node2;
    return(0);
    }
     
    cr88192, Nov 1, 2004
    #16
  17. Mike

    Mike Guest

    In article <>, Mike wrote:
    > In article <>, William Park wrote:
    >> Mike <> wrote:
    >>> XML has been chosen, I need to write the parser. Oh, and I do not have
    >>> to validate the XML, just parse it.

    >>
    >> Expat (www.libexpat.org). Practically every language has some sort of
    >> support for it, even Bash shell.
    >>

    >
    > Thanks for the expat suggestion. I have also read for libxml. I'd like to
    > find a few hundred lines of c code to do this.
    >
    > Mike


    Thanks for all the replies. I have chosen and am using mxml.

    Mike
     
    Mike, Nov 11, 2004
    #17
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Don HO

    MINIMAL xml parser

    Don HO, Dec 27, 2003, in forum: XML
    Replies:
    6
    Views:
    667
    Tad McClellan
    Jan 6, 2004
  2. Roberto Nunnari
    Replies:
    0
    Views:
    402
    Roberto Nunnari
    Feb 26, 2004
  3. Roberto Nunnari

    new minimal xml parser Open Source

    Roberto Nunnari, Feb 26, 2004, in forum: C Programming
    Replies:
    0
    Views:
    610
    Roberto Nunnari
    Feb 26, 2004
  4. arne
    Replies:
    0
    Views:
    371
  5. Sean
    Replies:
    3
    Views:
    310
    robic0
    Oct 3, 2006
Loading...

Share This Page