Expat problems

Discussion in 'XML' started by Jakob Møbjerg Nielsen, Nov 20, 2003.

  1. Expat keeps telling me that there is "junk after document element". I've tried different encoding, and I'm quite sure that the
    buffer is nul-terminated. I really have no idea to what the problem might be. Any ideas?

    X-POST: comp.lang.c, comp.text.xml (I don't know which group is the right one)



    -----Source code-----
    #include <stdio.h>
    #include <expat.h>

    void startElement(void *userData, const char *name, const char **atts)
    {
    printf("Got element: %S\nwith userData:\n%s\n", name, (char *)userData);
    }

    void endElement(void *userData, const char *name)
    {
    }

    int main(int argc, char *argv[])
    {
    FILE *fp;
    char *buffer;
    char *prog = argv[0];
    long fsize;
    XML_Parser parser;
    int userData = 0;
    int done;

    if(argc == 1) return 0;

    if ((fp = fopen(*++argv, "r")) == NULL) {
    fprintf(stderr, "%s: Can't open %s", prog, *argv);
    exit(1);
    } else {
    fseek(fp, 0, SEEK_END);
    fsize = ftell(fp);
    rewind(fp);

    buffer = (char *)malloc(fsize+1);

    if (buffer == NULL)
    exit(2);

    fread(buffer, 1, fsize, fp);

    buffer[fsize] = '\0';

    printf("%s\n", buffer);

    fclose(fp);

    parser = XML_ParserCreate((XML_Char *)"ISO-8859-1");
    XML_SetUserData(parser, &userData);
    XML_SetElementHandler(parser, startElement, endElement);
    do {
    done = fsize < sizeof(buffer);
    if (!XML_Parse(parser, buffer, fsize, 0)) {
    fprintf(stderr,
    "%s at line %d\n",
    XML_ErrorString(XML_GetErrorCode(parser)),
    XML_GetCurrentLineNumber(parser));
    return 1;
    }
    } while (!done);

    XML_ParserFree(parser);

    }

    return 0;
    }
    -------------------

    -----XML input-----
    <?xml version="1.0" ?>
    <a>
    </a>

    -------------------

    /Jakob
     
    Jakob Møbjerg Nielsen, Nov 20, 2003
    #1
    1. Advertising

  2. Jakob Møbjerg Nielsen wrote:

    > Expat keeps telling me that there is "junk after document element". I've tried different encoding, and I'm quite sure that the
    > buffer is nul-terminated. I really have no idea to what the problem might be. Any ideas?
    >
    > X-POST: comp.lang.c, comp.text.xml (I don't know which group is the right one)
    >
    >
    >
    > -----Source code-----
    > #include <stdio.h>
    > #include <expat.h>

    Not a standard header. What is in here?


    >
    > void startElement(void *userData, const char *name, const char **atts)
    > {
    > printf("Got element: %S\nwith userData:\n%s\n", name, (char *)userData);

    My understanding is that the printf() format specifiers are case
    sensitive, although I'm sure somebody here will correct me if I'm
    wrong.


    > }
    >
    > void endElement(void *userData, const char *name)
    > {
    > }
    >
    > int main(int argc, char *argv[])
    > {
    > FILE *fp;
    > char *buffer;
    > char *prog = argv[0];
    > long fsize;
    > XML_Parser parser;
    > int userData = 0;
    > int done;
    >
    > if(argc == 1) return 0;
    >
    > if ((fp = fopen(*++argv, "r")) == NULL) {
    > fprintf(stderr, "%s: Can't open %s", prog, *argv);
    > exit(1);
    > } else {
    > fseek(fp, 0, SEEK_END);
    > fsize = ftell(fp);
    > rewind(fp);


    There is no guarantee that the ending position of a file is the
    same as the size of the file. Character translations and other
    stuff may obscure the size. The only method to know the actual
    size of the file is to open the file in binary mode and count
    all the characters.


    >
    > buffer = (char *)malloc(fsize+1);

    In the times when memory was small and precious, input data
    was read in by "chunks" instead of the whole file into memory.
    Granted, reading it into memory is the most efficient method,
    there is no guarantee that your platform or the platform that
    this program will run on will have enough memory for the largest
    sized file. Harddisks are becoming larger these days.

    I say read in the data in chunks.


    >
    > if (buffer == NULL)
    > exit(2);

    You might want to be nice to the user and print a reason why
    the program is aborting.


    > fread(buffer, 1, fsize, fp);

    See above about reading in chunks.

    >
    > buffer[fsize] = '\0';
    >
    > printf("%s\n", buffer);

    You are printing the enter file here. Could take a while.
    Is this necessary?


    >
    > fclose(fp);
    >
    > parser = XML_ParserCreate((XML_Char *)"ISO-8859-1");
    > XML_SetUserData(parser, &userData);
    > XML_SetElementHandler(parser, startElement, endElement);
    > do {
    > done = fsize < sizeof(buffer);

    The expression "sizeof(buffer)" returns the size of the pointer,
    not the buffer. By the way, if you look up a few lines, you
    will note that the buffer was allocated with a size of
    "fsize + 1". So, what is this statement supposed to do?


    > if (!XML_Parse(parser, buffer, fsize, 0)) {
    > fprintf(stderr,
    > "%s at line %d\n",
    > XML_ErrorString(XML_GetErrorCode(parser)),
    > XML_GetCurrentLineNumber(parser));
    > return 1;
    > }
    > } while (!done);

    See my question about the assignment to "done" above.
    Why do you bother processing the data in chunks when
    you have read the entire file into memory?

    >
    > XML_ParserFree(parser);
    >
    > }
    >
    > return 0;
    > }


    I cannot comment on the correctness of the XML_*()
    function calls since I don't have that header file
    and you haven't supplied those declarations.


    --
    Thomas Matthews

    C++ newsgroup welcome message:
    http://www.slack.net/~shiva/welcome.txt
    C++ Faq: http://www.parashift.com/c -faq-lite
    C Faq: http://www.eskimo.com/~scs/c-faq/top.html
    alt.comp.lang.learn.c-c++ faq:
    http://www.raos.demon.uk/acllc-c /faq.html
    Other sites:
    http://www.josuttis.com -- C++ STL Library book
     
    Thomas Matthews, Nov 20, 2003
    #2
    1. Advertising

  3. Thomas Matthews wrote:
    >> #include <expat.h>

    > Not a standard header. What is in here?


    Expat - the XML parser.

    >> buffer = (char *)malloc(fsize+1);

    > I say read in the data in chunks.


    Well, this is just for testing with small XML files (probably not above
    1M).

    >> printf("%s\n", buffer);

    > You are printing the enter file here. Could take a while.
    > Is this necessary?


    Debugging :)
    I didn't want to start gdb just for looking at the contents of buffer.

    >> } while (!done);

    > See my question about the assignment to "done" above.
    > Why do you bother processing the data in chunks when
    > you have read the entire file into memory?


    Because, later on, the data will be streamed in from a socket.

    > I cannot comment on the correctness of the XML_*()
    > function calls since I don't have that header file
    > and you haven't supplied those declarations.


    There is quite a few:
    http://guinness.cs.stevens-tech.edu/packages/expat/reference.html

    Anyway, I've tried cleaning up a bit and played around with
    feeding the parser in a "stream-like" manner, but I still
    get that pesky "junk after document element" message. If I
    use UTF-8 I get a "not well-formed (invalid token)".

    #include <stdio.h>
    #include <expat.h>

    void startElement(void *userData, const char *name, const char **atts)
    {
    printf("Got start-element: %s\n", name);
    }

    void endElement(void *userData, const char *name)
    {
    printf("Got end-element: %s\n", name);
    }

    int main(int argc, char *argv[])
    {
    FILE *fp;
    char buffer[1];
    char *prog = argv[0];
    long fsize;
    XML_Parser parser;
    int userData = 0;
    int done;

    if(argc == 1) return 0;

    if ((fp = fopen(*++argv, "r")) == NULL) {
    fprintf(stderr, "%s: Can't open %s", prog, *argv);
    exit(1);
    } else {
    parser = XML_ParserCreate((XML_Char *)"ISO-8859-1");
    XML_SetUserData(parser, &userData);
    XML_SetElementHandler(parser, startElement, endElement);
    do {
    if (!feof(fp)) {
    buffer[0] = fgetc(fp);
    if (!XML_Parse(parser, buffer, strlen(buffer), feof(fp))) {
    fprintf(stderr,
    "%s at line %d\n",
    XML_ErrorString(XML_GetErrorCode(parser)),
    XML_GetCurrentLineNumber(parser));
    return 1;
    }
    }
    } while (!feof(fp));
    XML_ParserFree(parser);
    }
    return 0;
    }


    --
    Jakob Møbjerg Nielsen | "Nine-tenths of the universe is the
    | knowledge of the position and direction
    http://www.jakobnielsen.dk/ | of everything in the other tenth."
    | -- Terry Pratchett, Thief of Time
     
    Jakob Møbjerg Nielsen, Nov 21, 2003
    #3
  4. Examine for example elements.c Expat example file for more carefully,
    copy the parsing loop (do loop) from there.

    Replace only stdin with your FILE*. You might also want to open file in "rb"
    (binary mode) to avoid CRLF translations.

    it seems ou're trying something funny with strlen() in your code.

    with respect,
    Toni Uusitalo
     
    Toni Uusitalo, Nov 21, 2003
    #4
  5. In article <bpii2k$cbc$>,
    Jakob Møbjerg Nielsen <> wrote:

    % Expat keeps telling me that there is "junk after document element".

    % if ((fp = fopen(*++argv, "r")) == NULL) {
    % fprintf(stderr, "%s: Can't open %s", prog, *argv);
    % exit(1);
    % } else {
    % fseek(fp, 0, SEEK_END);
    % fsize = ftell(fp);
    % rewind(fp);
    %
    % buffer = (char *)malloc(fsize+1);
    %
    % if (buffer == NULL)
    % exit(2);
    %
    % fread(buffer, 1, fsize, fp);

    If you're not on a Unix system, ftell() might give you a larger value than
    fread() returns. You might want to check the return value of fread().

    % printf("%s\n", buffer);

    You might want to do a hex dump rather than just printing up to the first
    NULL. If there are trailing NULLS after the last >, expat while give you
    an error message.

    --

    Patrick TJ McPhee
    East York Canada
     
    Patrick TJ McPhee, Nov 21, 2003
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. winderjj
    Replies:
    3
    Views:
    492
    Toni Uusitalo
    Jul 30, 2003
  2. David Madore
    Replies:
    1
    Views:
    520
    Richard Tobin
    Aug 28, 2003
  3. Jakob Møbjerg Nielsen

    Expat problems

    Jakob Møbjerg Nielsen, Nov 20, 2003, in forum: C Programming
    Replies:
    4
    Views:
    395
    Patrick TJ McPhee
    Nov 21, 2003
  4. sharan
    Replies:
    1
    Views:
    727
    Pavel Lepin
    Oct 26, 2007
  5. nnguyen
    Replies:
    3
    Views:
    466
    nnguyen
    Dec 11, 2009
Loading...

Share This Page