XML Parser

A

an0047

Hello

Would like to develop a simple XML parser with own commands

The aproach is first to develop a state machine to later implement it
in C. I had a look to some posts relating lexical analysers but the
information
i found was not helpfull.

I know there are some books relating the creation of tables and
complicated
equations to analyse text, but don't know how to look for them.

Any recommendation, tips on how to implement the parser or maybe
literature
reference (book, paper) would be kindly appreciated.

Best Regards
 
J

Joseph Kesselman

Many good XML parsers exist. It sounds like you're going to need to do a
fair amount of homework before constructing your own. Reinventing the
wheel is probably not very useful unless you have a real interest in
learning how parsers function.

One standard standard reference which covers this topic: "Compilers:
Principles, Techniques, and Tools" (Aho, Ullman, and others). You can
ignore the code-generation and optimization sections, but the parsing
portions of the task are essentially the same, and the typechecking
chapter may be relevant if you want to implement validation.
 
J

Juergen Kahrs

Any recommendation, tips on how to implement the parser or maybe
literature
reference (book, paper) would be kindly appreciated.

This question has been answered here several times.
Google for it. Usually, we warn newbies who want to
write their own parsers. You will be surprised about
the tricky details. Have you ever heard of a BOM ?
Are you prepared to process 32-bit-characters ?
 
J

Joe Kesselman

Juergen said:
Have you ever heard of a BOM ? Are you prepared to process 32-bit-characters ?

The usual estimate is that a complete XML parser is about the right size
to be a serious term project for a college student who already
understands the basics of writing parsers.

You can rattle off a subset in less time than that. But, again, unless
you have very special needs (such as a language where nobody has written
one yet and which can't link to existing parsers), the question is "why".
 
A

an0047

Many good XML parsers exist. It sounds like you're going to need to do a
fair amount of homework before constructing your own. Reinventing the
wheel is probably not very useful unless you have a real interest in
learning how parsers function.

One standard standard reference which covers this topic: "Compilers:
Principles, Techniques, and Tools" (Aho, Ullman, and others). You can
ignore the code-generation and optimization sections, but the parsing
portions of the task are essentially the same, and the typechecking
chapter may be relevant if you want to implement validation.

Thanks for your answer and reference!. I'm not trying to reinvent the
wheel, I'm trying to write a very simple and reliable parser for a
commercial software. The ones out there are very complex, big and
license violation needs to be taken under consideration. If you know
about a very simple one written in C please let me know.

I have indeed a real interest in learning how parsers function, that's
because I asked for a book reference. As for now the new state machine
has 5 states and more or less I can handle some simple XLM tags.

Regards
 
A

an0047

This question has been answered here several times.
Google for it. Usually, we warn newbies who want to
write their own parsers. You will be surprised about
the tricky details. Have you ever heard of a BOM ?
Are you prepared to process 32-bit-characters ?

Hi thanks for your answer and thanks for the warning too. As a newbie
I need and want to learn about parsers. Actually I don't even know if
I'm posting at the right group, unfortunately I didn't found any
information on the web that satisfied my search and that is the reason
of my post. The characters are still 8 bit long and I think they will
remain like that. For your pleasure I had great problems handling
chars and strings under C. I don't know to which question do you refer
but if you could point me to posts that talk about the implementation
(and state machine) of the kind of parser described above I would
kindly appreciate it, have a nice day and best regards
 
A

an0047

The usual estimate is that a complete XML parser is about the right size
to be a serious term project for a college student who already
understands the basics of writing parsers.

You can rattle off a subset in less time than that. But, again, unless
you have very special needs (such as a language where nobody has written
one yet and which can't link to existing parsers), the question is "why".

Why? I think the answer is the posts above
 
J

Joseph Kesselman

The existing parsers are complex because that's what's required to do a
good job. Supporting a trivial subset of XML is near-trivial, but there
is a lot more that has to be dealt with if you want your code to survive
contact with real data and real users.

Everything should be as simple as possible, but not simpler.

If you want to learn about parsers, implementing a sloppy subset really
won't teach you much.

"Try not. Do! ... Or do not."

There are royalty-free parsers out there, if that's your concern. I
don't know what's available in plain C these days, but Apache's Xerces
parser is available in a C++ version.
 
J

Joseph Kesselman

Joseph said:
There are royalty-free parsers out there, if that's your concern.

For what it's worth, the W3C's own website just suggests you do a
websearch for "XML parser" to get a list of the available parsers.
Adding "in C" and "free" to that suggests that you might want to look at
libxml2, XMLTok, expat, and possibly others.

(I haven't used any C-based XML parser in years, so I can't offer
opinions on any of these.)
 
?

=?ISO-8859-1?Q?J=FCrgen_Kahrs?=

Hi thanks for your answer and thanks for the warning too. As a newbie
I need and want to learn about parsers. Actually I don't even know if

Use one of these as a starting point:

http://www.grinninglizard.com/tinyxmldocs/index.html
http://www.danoneverythingelse.com/articles/Softwarebxmlnode.html
I'm posting at the right group, unfortunately I didn't found any
information on the web that satisfied my search and that is the reason
of my post. The characters are still 8 bit long and I think they will
remain like that. For your pleasure I had great problems handling

OK, then you focus on processing XML files produced
by yourself.
chars and strings under C. I don't know to which question do you refer
but if you could point me to posts that talk about the implementation
(and state machine) of the kind of parser described above I would
kindly appreciate it, have a nice day and best regards

The links above are the main points.
 
A

an0047

If you want to learn about parsers, implementing a sloppy subset really
won't teach you much.

Then what will teach me excluding the use of a fix and finished xml
library?
There are royalty-free parsers out there, if that's your concern.

No that's not my main concern
don't know what's available in plain C these days, but Apache's Xerces
parser is available in a C++ version.

I didn't know either and that is the reason of my post
 
A

an0047

(If your parser doesn't support all of XML, it isn't an XML parser.)

It won't be an XML parser, if I make it to finish it, it will be an
own command XML parser
 
J

Joe Kesselman

It won't be an XML parser, if I make it to finish it, it will be an
own command XML parser

I don't know what "own command" means in this context, I'm afraid. But
I'm not sure I need to; I've raised the relevant questions, I think.
 
A

an0047

I don't know what "own command" means in this context, I'm afraid. But
I'm not sure I need to; I've raised the relevant questions, I think.

Own dataset? Thank you for the reference, I just got the book you
recommended
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,774
Messages
2,569,596
Members
45,143
Latest member
SterlingLa
Top