loosing data while parsing xml with expat


F

Fabian Kr?ger

Hello,

I got a weird problem and need your help and ideas...

I´ve written an php application which imports data in XML format and
writes this data to a MySQL database to have a faster access.

The application uses Expat 1.95.7 via php to render the xml data.

First everything seemed to work fine. But now I noticed that something
goes wrong:

If the ammount of XML data is larger than used for testing the
application, we´re talking about something between 2 and 4 MB, some
data gets lost.

If the structure of the file doesn´t change the lost data is always
the same.

But if I change the structure of the File e.g. by adding a line
somewhere the problem occures on another place.

For Example:

<event>

<SysId>27</SysId>

<ClientId>1</ClientId>

<EventNo>9402</EventNo>

<EventName>Martin Schneider Karben</EventName>

<category>

<Type>Keine Veranstaltungsart</Type>

</category>

.....

</event>

Let´s assume that "Mar" of the data between the <EventName> Tags gets
lost and we get "tin Schneider Karben".

When I insert a Line above the <event> block the "t" from "tin" gets
also lost, so we have "in Schneider Karben".

Why ?

I also tried to dynamically generate parts of the xml data with php:


//--------------- CODE
------------------------------------------------------//
<?php
// num of datasets
$datasets = 2000;
// build the xml string
$str .= '<?xml version="1.0" encoding="ISO-8859-1"?><program
xmlns="http://www.orestes.de">'."\n";
for($i=0; $i<$datasets; $i++){
$str .= '<event>
<SysId>27</SysId>
<ClientId>1</ClientId>
<EventNo>'.$i.'</EventNo>
<EventName>NUM'.$i.'</EventName>
<category>
<Type>Keine Veranstaltungsart</Type>
</category>
<location>
<Name>location_name_'.$i.'</Name>
<Street>Strasse</Street>
<ZIP>32333</ZIP>
<City>City</City>
<Country></Country>
</location>
<Currency>EUR</Currency>
<show>
<ShowNo>1</ShowNo>
<ShowDate>31.12.2004</ShowDate>
<ShowTime>20:00</ShowTime>
<ShowWeekday>Freitag</ShowWeekday>
<ShowPage href="32160001.jsp">TPP Gutscheine</ShowPage>
<Info></Info>
<block number="0">
<FreeSeats>61</FreeSeats>
</block>
</show>
</event>';
}
$str .= "</program>";
// write the data to file
$fp = fopen("../DATA/elektra.xml","w");
fputs($fp, $str);
fclose($fp);
?>
//--------------- CODE END
--------------------------------------------------//



with this generated file NUM1644 becomes 1644 and NUM1195 becomes 5.
All other data is parsed correctly ?!?!


Here the Code of the two Classes used for parsing and importing:


//--------------- CODE
------------------------------------------------------//
<?php
require_once "DB.php";

class ElektraImporter
{
var $FileHash;
var $DAO;
var $XMLDataFile;

function ElektraImporter(){
$this->XMLDataFile = Config::getAttribute("Config/Config_Base",
"elektra_xml");

$DB = DB::connect(Config::getAttribute("Config/Config_Base",
"dsn"));
if(DB::isError($DB)){die(DB::ErrorMessage($DB));}
$this->DAO = Loader::buildObject("XML/ElektraDAO", null, $DB);
}
/**
* checks for changes on the elektra xml data.
* If there are changes the database will be refreshed
*/
function checkForUpdate(){
/* if there are changes */
if($this->_hasElektraFileChanged($this->DAO->getElektraFileHashCode())){
/* read the file and update the database */
$this->DAO->updateElektraData($this->_getElektraData());
} else {
/* everything is o.k. */
}
}
/**
* parse the xml file and get the needed data
* @return array $data
*/
function _getElektraData(){
$Parser = &Loader::buildObject("XML/ElektraParser", null,
array(&$arr));
if( PEAR::isError($Parser) ){
die (PEAR::errorMsg($Parser));
}
$Parser->setInputFile($this->XMLDataFile);
if(PEAR::isError($Parser)){ die($Parser->getMessage()); }

$data = $Parser->getXMLData();

$data['filehash'] = md5_file($this->XMLDataFile);

return $data;
}
/**
* checks if the file has changed
* @return boolean
*/
function _hasElektraFileChanged($filehash = ""){
$this->FileHash = md5_file($this->XMLDataFile);

if($filehash == $this->FileHash){
return false;
} else {
return true;
}
}
}
?>
//--------------- CODE END
--------------------------------------------------//


The Parser Class extending the PEAR::XML_Parser


//--------------- CODE
------------------------------------------------------//
<?php
require_once "XML/Parser.php";


class ElektraParser extends XML_Parser
{
var $XMLData;
var $EventNo;
var $EventName;
var $LastEventNo;
var $ActualEventNo;
var $EventCnt = 0;
var $ShowCnt = 0;

function ElektraParser(&$arr){
$this->XMLData = &$arr;
$this->XML_Parser("ISO-8859-1", "event", "ISO-8859-1");
}

function startHandler($xp, $element, $attribs) {
$this->Element = $element;
$this->Attribs = $attribs;
}

function endHandler($xp, $element) {
if ( $element == "EVENT" ){
/* increase event counter */
$this->EventCnt++;
/* set show counter to 0 */
$this->ShowCnt = 0;
}
elseif ( $element == "SHOW" ){
/* increase show counter for the next show */
$this->ShowCnt ++;
}
$this->Element = "";
}

function cdataHandler($xp, $cdata) {
if($this->Element == "DATE"){
$this->XMLData['creationdate'] = $cdata;
}
elseif($this->Element == "TIME"){
$this->XMLData['creationtime'] = $cdata;
}
/* every event has a sysid the sysid and the eventno make the unique
eventid */
elseif($this->Element == "SYSID"){
$this->XMLData['event'][$this->EventCnt]['sysid'] = $cdata;
}
elseif($this->Element == "CLIENTID"){
$this->XMLData['event'][$this->EventCnt]['clientid'] = $cdata;
}
elseif($this->Element == "EVENTNO"){
$this->XMLData['event'][$this->EventCnt]['eventno'] = $cdata;
}
elseif($this->Element == "EVENTNAME"){
$this->XMLData['event'][$this->EventCnt]['eventname'] = $cdata;
}
elseif($this->Element == "NAME"){
$this->XMLData['event'][$this->EventCnt]['location'] = $cdata;
}
elseif($this->Element == "CITY"){
$this->XMLData['event'][$this->EventCnt]['city'] = $cdata;

/* eventgroups */
/* get the position of the first occurence of the city in the
eventname */
$pos = strpos($this->XMLData['event'][$this->EventCnt]['eventname'],
$cdata);
/* if there´s the city in the name */
if( $pos ){
$this->XMLData['event'][$this->EventCnt]['group'] =
trim(substr($this->XMLData['event'][$this->EventCnt]['eventname'], 0,
$pos));
}
/* otherwise we take the whole eventname as group */
else {
$this->XMLData['event'][$this->EventCnt]['group'] =
trim($this->XMLData['event'][$this->EventCnt]['eventname']);
}
}
/* get the shows */
elseif($this->Element == "SHOWNO") {
$this->XMLData['event'][$this->EventCnt]['show'][$this->ShowCnt]['showno']
= $cdata;
}
elseif($this->Element == "SHOWDATE") {
$this->XMLData['event'][$this->EventCnt]['show'][$this->ShowCnt]['showdate']
= $cdata;
}
elseif($this->Element == "SHOWTIME") {
$this->XMLData['event'][$this->EventCnt]['show'][$this->ShowCnt]['showtime']
= $cdata;
}
elseif($this->Element == "SHOWPAGE"){
$this->XMLData['event'][$this->EventCnt]['show'][$this->ShowCnt]['showpage']
= $this->Attribs['HREF'];
}
}
function defaultHandler($xp, $cdata) {

}
function &getXMLData(){
$p = $this->parse();
if(PEAR::isError($p)){ die($p->getMessage()); }
return $this->XMLData;
}
}
?>
//--------------- CODE END
--------------------------------------------------//



This Problem is real bad because eventIDs have been stripped as well
and then my SQL Statements didn´t work anymore !!!

I have no idea what the reason is or even might be =(
a bug in Expat ?!? ... i can´t really believe
bad formatted XML ? ... not really !?!
problems with expats memory management ?!?
or just my fault? ... where ?

But it seems that the problem is coupled to the format of the xml
file.
If i take out linebreaks or add lines the error occures on other
places !?!
But the same structure always produces the same errors ?!?


My XML skills are not that good so I would be very pleased if you have
an idea or an advice for me.

Thanks for your advice.

With best regards

Fabian Krüger
 
Ad

Advertisements


Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Top