Need help on File parsing

M

Maxx

I'm writing a C program which would parse a xml file as its input and
perform specific operations...
Now what i have in my mind is that i should declare a two dimensional
array and store the xml file in it

for example::: char country[][]={<countries>,
<country>,
<text>Norway</
text>,
<value>N</value>,
</country>}, and so on


My question is... is there any better way to do this, i.e. is there
any better way to store the xml input input..

Thanks
 
I

Ian Collins

I'm writing a C program which would parse a xml file as its input and
perform specific operations...
Now what i have in my mind is that i should declare a two dimensional
array and store the xml file in it

for example::: char country[][]={<countries>,
<country>,
<text>Norway</
text>,
<value>N</value>,
</country>}, and so on


My question is... is there any better way to do this, i.e. is there
any better way to store the xml input input..

That's more of a generic programming question than a C one. Have a look
at a common XML parser like libxml, the documentation will give you
ideas even if you choose not to use the library.
 
B

Ben Bacarisse

Maxx said:
I'm writing a C program which would parse a xml file as its input and
perform specific operations...

What specific operations? See below...
Now what i have in my mind is that i should declare a two dimensional
array and store the xml file in it

for example::: char country[][]={<countries>,
<country>,
<text>Norway</
text>,
<value>N</value>,
</country>}, and so on


My question is... is there any better way to do this, i.e. is there
any better way to store the xml input input..

It's almost impossible to say without knowing how a piece of data is
going to be accessed (or manipulated).

A good place to post would be comp.programming. If you say what you
propose to do with the XML you should get good help there. Be prepared
to be told that you should use an existing XML parsing library (because
that is almost always the right answer).
 
N

Nobody

I'm writing a C program which would parse a xml file as its input and
perform specific operations...
Now what i have in my mind is that i should declare a two dimensional
array and store the xml file in it
My question is... is there any better way to do this, i.e. is there any
better way to store the xml input input..

Yes. In fact, it would be hard to imagine a worse way.

First, I wouldn't recommend trying to actually parse the XML yourself, as
you're practically bound to get it wrong. Use an XML parsing library
instead.

XML parsing libraries come in two main flavours: DOM and SAX. DOM
constructs a parse tree for the entire file, which the application can
then query. SAX generates events (reported via callbacks) as it parses the
file; it's up to the application to actually store the data.

Which flavour to use and exactly how to do it depend upon the details of
the application.
 
M

Malcolm McLean

My question is... is there any better way to do this, i.e. is there
any better way to store the xml input input..
Think of the XML as a tree, and build what is known as a recursive
descent parser.

Basically it's the same problem as a mathematical expression with
deeply nested parentheses, in a slightly different form. You need one
token of lookahead.

Once you've converted the XML to a tree, you'll usually want to walk
the tree to convert to a set of nested arrays, but sometimes it will
be better to keep the data in tree form.
 
D

Dr Nick

Malcolm McLean said:
Think of the XML as a tree, and build what is known as a recursive
descent parser.

Basically it's the same problem as a mathematical expression with
deeply nested parentheses, in a slightly different form. You need one
token of lookahead.

Once you've converted the XML to a tree, you'll usually want to walk
the tree to convert to a set of nested arrays, but sometimes it will
be better to keep the data in tree form.

I did it the other way round. First I wrote a good generic "values"
handling system that allowed me to have named strings, integers, lists,
string-indexed-arrays, all as recursive as you like. That was the
difficult bit.

They I just hooked xmlparse up to it and it sucked the XML in nicely.

Think hard about what you want to, if anything, to distinguish between:

<stuff>
<item>fred</item>
</stuff>

<stuff item="fred"/>

To summarise - you need more a specification of the problem before
starting to find a solution.
 
M

Maxx

I'm writing a C program which would parse a xml file as its input and
perform specific operations...
Now what i have in my mind is that i should declare a two dimensional
array and store the xml file in it
for example:::  char country[][]={<countries>,
                                                 <country>,
                                                     <text>Norway</
text>,
                                                    <value>N</value>,
                                                </country>}, and so on
My question is... is there any better way to do this, i.e. is there
any better way to store the xml input input..

That's more of a generic programming question than a C one.  Have a look
at a common XML parser like libxml, the documentation will give you
ideas even if you choose not to use the library.

Alright i've looked up libxml and seems to have hit jackpot... It does
contains the necessary function which i need...
Thanks
 
M

Maxx

Yes. In fact, it would be hard to imagine a worse way.

First, I wouldn't recommend trying to actually parse the XML yourself, as
you're practically bound to get it wrong. Use an XML parsing library
instead.

XML parsing libraries come in two main flavours: DOM and SAX. DOM
constructs a parse tree for the entire file, which the application can
then query. SAX generates events (reported via callbacks) as it parses the
file; it's up to the application to actually store the data.

Which flavour to use and exactly how to do it depend upon the details of
the application.

Actually the xml file that i was going to provide the program will
always have a predefined format, like the one example i gave above.It
will always parse the same format and simply extract the values from
the fields and write another xml file having the same template... so i
was looking for the easiest way to solve it, instead of requiring to
call extensive library functions...

any ways Thanks
 
M

Maxx

Think of the XML as a tree, and build what is known as a recursive
descent parser.

Basically it's the same problem as a mathematical expression with
deeply nested parentheses, in a slightly different form. You need one
token of lookahead.

Once you've converted the XML to a tree, you'll usually want to walk
the tree to convert to a set of nested arrays, but sometimes it will
be better to keep the data in tree form.

Yeah i had this concept in mind at first, but as i was going to write
a simple program which would simply extract values from a set of
predefined fields, so i kinda avoided going into trees.. Although i
recon a tree would be the best solution but i'm still quite naive in
trees.

Thanks
 
M

Maxx

I did it the other way round.  First I wrote a good generic "values"
handling system that allowed me to have named strings, integers, lists,
string-indexed-arrays, all as recursive as you like.   That was the
difficult bit.

They I just hooked xmlparse up to it and it sucked the XML in nicely.

Think hard about what you want to, if anything, to distinguish between:

<stuff>
<item>fred</item>
</stuff>

<stuff item="fred"/>

To summarise - you need more a specification of the problem before
starting to find a solution.

Yeah yeah a generic list of values would be helpful but i need more
ideas on how to implement it.. I'm trying to avoid library function in
this program as it will always parse the same fields over and over
again..
 
D

David Resnick

Actually the xml file that i was going to provide the program will
always have a predefined format, like the one example i gave above.It
will always parse the same format and simply extract the values from
the fields and write another xml file having the same template... so i
was looking for the easiest way to solve it, instead of requiring to
call extensive library functions...

Note that it always starts this way. It is easy to hand parse the XML
if it is in a truly fixed format, so why use a real parser? But then
there are modifications/extensions/etc. People hand edit the file and
add white space, which won't confuse a parser but messes up your less
flexible hand parse. People write a mixture of <element></element>
instead of <element/>, which should parse as equivalent and somehow
don't when hand parsing. People suddenly want validation. etc.
Going with a real parser is very much the way to go in a real
application, much more future friendly even if not apparently needed
up front...
 
J

John Bode

Note that it always starts this way.  It is easy to hand parse the XML
if it is in a truly fixed format, so why use a real parser?  But then
there are modifications/extensions/etc.  People hand edit the file and
add white space, which won't confuse a parser but messes up your less
flexible hand parse.  People write a mixture of <element></element>
instead of <element/>, which should parse as equivalent and somehow
don't when hand parsing.  People suddenly want validation.  etc.
Going with a real parser is very much the way to go in a real
application, much more future friendly even if not apparently needed
up front...

Not to mention it's code that *you* don't have to write or test.

Figuring out how to use the library in your code will take less time
than writing a robust parser from scratch. Yes, you can hand-hack a
minimal, non-validating, less-than-totally-robust XML parser in an
afternoon (I've done it), but you'll be tweaking that sucker
*constantly* (which I did as well).
 
M

Michael Press

David Resnick said:
Note that it always starts this way. It is easy to hand parse the XML
if it is in a truly fixed format, so why use a real parser? But then
there are modifications/extensions/etc. People hand edit the file and
add white space, which won't confuse a parser but messes up your less
flexible hand parse. People write a mixture of <element></element>
instead of <element/>, which should parse as equivalent and somehow
don't when hand parsing. People suddenly want validation. etc.
Going with a real parser is very much the way to go in a real
application, much more future friendly even if not apparently needed
up front...

XML is the same as csh. Every time somebody raises a
problem with XML somebody else steps in and presents an
easy workaround. Eventually you are told not even to
try writing a parser. It is the death of a thousand
cuts. And for what?

XML gives PHBs the illusion that they know about
programming; and adventurers a cozy berth. XML is a scam.

Has XML gotten to the point a universal Turing machine
could be written in XML, or is it still singing "Daisy"?
 
D

David Resnick

XML is the same as csh. Every time somebody raises a
problem with XML somebody else steps in and presents an
easy workaround. Eventually you are told not even to
try writing a parser. It is the death of a thousand
cuts. And for what?

XML gives PHBs the illusion that they know about
programming; and adventurers a cozy berth. XML is a scam.

Has XML gotten to the point a universal Turing machine
could be written in XML, or is it still singing "Daisy"?

XML is great in its place. Not a PHB, and don't believe
it to be a scam. I love it for flatfiles that need
structured information and flexibility. Easy to extend,
easy (with XPATH queries say) to get stuff out of.
Standard, everyone knows what it means, how to add
to it, how to parse and validate it. Does it solve
all problems in the world? Of course not...

-David
 
N

Nobody

Note that it always starts this way. It is easy to hand parse the XML
if it is in a truly fixed format,

If you restrict the application to reading a subset of XML, that defeats
the purpose of using XML in the first place.

You can find a wide range of tools which can process XML, but the range of
tools which can process a particular custom subset of XML is likely to be
much smaller (i.e. those tools which you write yourself).

If you think that you only need to support files written by a particular
program, you're likely to end up only supporting files which were directly
written by that program and not post-processed in any way. This often
makes your program less useful than you had originally assumed.
 
M

Malcolm McLean

Figuring out how to use the library in your code will take less time
than writing a robust parser from scratch.  Yes, you can hand-hack a
minimal, non-validating, less-than-totally-robust XML parser in an
afternoon (I've done it), but you'll be tweaking that sucker
*constantly* (which I did as well).
The problem is that it becomes harder to distribute the program. Even
if you have source to the library, it's often in messy files that are
hard to integrate and distract the reader from the actual logical core
of the program.
 
D

David Resnick

If you restrict the application to reading a subset of XML, that defeats
the purpose of using XML in the first place.

You can find a wide range of tools which can process XML, but the range of
tools which can process a particular custom subset of XML is likely to be
much smaller (i.e. those tools which you write yourself).

If you think that you only need to support files written by a particular
program, you're likely to end up only supporting files which were directly
written by that program and not post-processed in any way. This often
makes your program less useful than you had originally assumed.

Holy out of context quotes, Batman. Your reply misses the entire
point
of mine, which is that hand parsing is a bad idea. Did you read the
rest of the post or just answer after the first 2 lines?

-David
 
N

Nobody

Holy out of context quotes, Batman. Your reply misses the entire
point of mine, which is that hand parsing is a bad idea. Did you read the
rest of the post or just answer after the first 2 lines?

I wasn't "replying" to your comments. I elaborated on your reply,
providing more reasons why it's a bad idea to assume that you only need
to handle a subset.
 
D

David Resnick

I wasn't "replying" to your comments. I elaborated on your reply,
providing more reasons why it's a bad idea to assume that you only need
to handle a subset.

Just seemed to be replying to my comments, as that was the only quoted
text being addressed. My mistake.

-David
 
M

Maxx

Note that it always starts this way.  It is easy to hand parse the XML
if it is in a truly fixed format, so why use a real parser?  But then
there are modifications/extensions/etc.  People hand edit the file and
add white space, which won't confuse a parser but messes up your less
flexible hand parse.  People write a mixture of <element></element>
instead of <element/>, which should parse as equivalent and somehow
don't when hand parsing.  People suddenly want validation.  etc.
Going with a real parser is very much the way to go in a real
application, much more future friendly even if not apparently needed
up front...

I'm using the parser so that i can extract the necessary values from
specific fields...Anyways i have decided to go with a real parser as
its becoming too cumbersome.


Thanks
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,743
Messages
2,569,478
Members
44,899
Latest member
RodneyMcAu

Latest Threads

Top