newby question : parse xml file

E

EF

Hi

I have xml file that look like that
<entities>
<Comp id="123">
<Description max_length="" reference_to="" type="multiline_string"/>
<Clone max_length="" reference_to="Comp" type="reference_list">
<Comp id="129" />
</Clone>
</Comp>
<Comp id="124">
<Description max_length="" reference_to="" type="multiline_string"/>
</Comp>
</entities>

I need a way to get the Comp id numbers that are only on the top level
( under entities )
and not in all xml file.

Thanks
 
D

David Squire

EF said:
I have xml file that look like that
<entities>
<Comp id="123">
<Description max_length="" reference_to="" type="multiline_string"/>
<Clone max_length="" reference_to="Comp" type="reference_list">
<Comp id="129" />
</Clone>
</Comp>
<Comp id="124">
<Description max_length="" reference_to="" type="multiline_string"/>
</Comp>
</entities>

I need a way to get the Comp id numbers that are only on the top level
( under entities )
and not in all xml file.

I take it you are using a module such as XML::parser?

One way to handle this is to set a flag to some OK value in the start
handler section that handles entities elements, then in the section of
the start handler that handles Comp elements, do what you need to do iff
the flag is OK, then set it to a not OK value.

DS

PS. Having a element called "entities" is likely to cause confusion in
the XML world.
 
M

Michel Rodriguez

EF said:
I have xml file that look like that
<entities>
<Comp id="123">
<Description max_length="" reference_to="" type="multiline_string"/>
<Clone max_length="" reference_to="Comp" type="reference_list">
<Comp id="129" />
</Clone>
</Comp>
<Comp id="124">
<Description max_length="" reference_to="" type="multiline_string"/>
</Comp>
</entities>

I need a way to get the Comp id numbers that are only on the top level
( under entities )
and not in all xml file.

Hi,

Any module that offers XPath support of some sort will make it easy.

With XML::Twig you can do this:

#!/usr/bin/perl

use strict;
use warnings;

use XML::Twig;

my @ids;

XML::Twig->new( twig_handlers => { '/entities/Comp' => sub { push @ids,
$_->id } })
->parse( \*DATA);

$,="\n";
print @ids;

__DATA__
<entities>
<Comp id="123">
<Description max_length="" reference_to="" type="multiline_string"/>
<Clone max_length="" reference_to="Comp" type="reference_list">
<Comp id="129" />
</Clone>
</Comp>
<Comp id="124">
<Description max_length="" reference_to="" type="multiline_string"/>
</Comp>
</entities>
 
D

David Squire

David said:
....


One way to handle this is to set a flag to some OK value in the start
handler section that handles entities elements, then in the section of
the start handler that handles Comp elements, do what you need to do iff
the flag is OK, then set it to a not OK value.

Sorry, I have misread the question. For some reason I thought you wanted
only the first Comp child of entities. Please ignore.

DS
 
S

Sherm Pendley

EF said:
I have xml file that look like that
<entities>
<Comp id="123">
<Description max_length="" reference_to="" type="multiline_string"/>
<Clone max_length="" reference_to="Comp" type="reference_list">
<Comp id="129" />
</Clone>
</Comp>
<Comp id="124">
<Description max_length="" reference_to="" type="multiline_string"/>
</Comp>
</entities>

I need a way to get the Comp id numbers that are only on the top level
( under entities )
and not in all xml file.

What have you tried so far? What were the results, and how were they diff-
erent from the results you expected?

Have you read the posting guidelines for this group yet? It's generally
expected that you give it your best shot first, and then ask for help if
you get stuck.

sherm--
 
R

robic0

Hi

I have xml file that look like that
<entities>
<Comp id="123">
<Description max_length="" reference_to="" type="multiline_string"/>
<Clone max_length="" reference_to="Comp" type="reference_list">
<Comp id="129" />
</Clone>
</Comp>
<Comp id="124">
<Description max_length="" reference_to="" type="multiline_string"/>
</Comp>
</entities>

I need a way to get the Comp id numbers that are only on the top level
( under entities )
and not in all xml file.

Thanks

I'm assuming the tag names as well as the entire xml has symbolic names to
some other real xml.
Its like looking at bad varaible naming convention in Perl.
'<entities>' especially, since, although there is an '!ENTITY keyword tag,
the word itself represents a substitution of content data.

Just an observation. Perhaps you should read and understand xml before you try to
work on it. Here's a good reference:

http://www.w3.org/TR/xml11/

Not very popular reading, granted...
 
J

John

Jim Gibson said:
EF said:
Hi

I have xml file that look like that

[XML snipped, see below]
I need a way to get the Comp id numbers that are only on the top level
( under entities )
and not in all xml file.

Use an XML parser such as XML::Simple that puts the data into a tree
structure and extract only the items you want from an explicit level in
the tree:

#!/usr/local/bin/perl
#
use strict;
use warnings;
use XML::Simple;

undef $/;
my $string = <DATA>;
my $xml = XMLin($string);
my @keys = keys %{$xml->{Comp}};
print "Comp: @keys\n";

__END__
<entities>
<Comp id="123">
<Description max_length="" reference_to="" type="multiline_string"/>
<Clone max_length="" reference_to="Comp" type="reference_list">
<Comp id="129" />
</Clone>
</Comp>
<Comp id="124">
<Description max_length="" reference_to="" type="multiline_string"/>
</Comp>
</entities>

__Output__

Comp: 124 123


I always use XML:Simple but would add ForceArray=>1, suppressempty=>1.

my ($response)=@_; # XML string
my $xml = new XML::Simple (ForceArray=>1, suppressempty=>1); # create object
my $data = $xml->XMLin("<ignore>" . $response . "</ignore>"); # tags needed
since we have a string

Everthing is then in a hash array and can be accessed directly.

Regards
John
 
J

John Bokma

EF said:
Hi

I have xml file that look like that
<entities>
<Comp id="123">
<Description max_length="" reference_to=""
type="multiline_string"/>
<Clone max_length="" reference_to="Comp"
type="reference_list">
<Comp id="129" />
</Clone>
</Comp>
<Comp id="124">
<Description max_length="" reference_to=""
type="multiline_string"/>
</Comp>
</entities>

I need a way to get the Comp id numbers that are only on the top level
( under entities )
and not in all xml file.

Alternative solution, using XML::parser:

http://johnbokma.com/perl/element-id-for-given-parent.html
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,767
Messages
2,569,572
Members
45,046
Latest member
Gavizuho

Latest Threads

Top