loosing data while parsing xml with expat

Discussion in 'XML' started by Fabian Kr?ger, Nov 19, 2003.

  1. Hello,

    I got a weird problem and need your help and ideas...

    I´ve written an php application which imports data in XML format and
    writes this data to a MySQL database to have a faster access.

    The application uses Expat 1.95.7 via php to render the xml data.

    First everything seemed to work fine. But now I noticed that something
    goes wrong:

    If the ammount of XML data is larger than used for testing the
    application, we´re talking about something between 2 and 4 MB, some
    data gets lost.

    If the structure of the file doesn´t change the lost data is always
    the same.

    But if I change the structure of the File e.g. by adding a line
    somewhere the problem occures on another place.

    For Example:

    <event>

    <SysId>27</SysId>

    <ClientId>1</ClientId>

    <EventNo>9402</EventNo>

    <EventName>Martin Schneider Karben</EventName>

    <category>

    <Type>Keine Veranstaltungsart</Type>

    </category>

    .....

    </event>

    Let´s assume that "Mar" of the data between the <EventName> Tags gets
    lost and we get "tin Schneider Karben".

    When I insert a Line above the <event> block the "t" from "tin" gets
    also lost, so we have "in Schneider Karben".

    Why ?

    I also tried to dynamically generate parts of the xml data with php:


    //--------------- CODE
    ------------------------------------------------------//
    <?php
    // num of datasets
    $datasets = 2000;
    // build the xml string
    $str .= '<?xml version="1.0" encoding="ISO-8859-1"?><program
    xmlns="http://www.orestes.de">'."\n";
    for($i=0; $i<$datasets; $i++){
    $str .= '<event>
    <SysId>27</SysId>
    <ClientId>1</ClientId>
    <EventNo>'.$i.'</EventNo>
    <EventName>NUM'.$i.'</EventName>
    <category>
    <Type>Keine Veranstaltungsart</Type>
    </category>
    <location>
    <Name>location_name_'.$i.'</Name>
    <Street>Strasse</Street>
    <ZIP>32333</ZIP>
    <City>City</City>
    <Country></Country>
    </location>
    <Currency>EUR</Currency>
    <show>
    <ShowNo>1</ShowNo>
    <ShowDate>31.12.2004</ShowDate>
    <ShowTime>20:00</ShowTime>
    <ShowWeekday>Freitag</ShowWeekday>
    <ShowPage href="32160001.jsp">TPP Gutscheine</ShowPage>
    <Info></Info>
    <block number="0">
    <FreeSeats>61</FreeSeats>
    </block>
    </show>
    </event>';
    }
    $str .= "</program>";
    // write the data to file
    $fp = fopen("../DATA/elektra.xml","w");
    fputs($fp, $str);
    fclose($fp);
    ?>
    //--------------- CODE END
    --------------------------------------------------//



    with this generated file NUM1644 becomes 1644 and NUM1195 becomes 5.
    All other data is parsed correctly ?!?!


    Here the Code of the two Classes used for parsing and importing:


    //--------------- CODE
    ------------------------------------------------------//
    <?php
    require_once "DB.php";

    class ElektraImporter
    {
    var $FileHash;
    var $DAO;
    var $XMLDataFile;

    function ElektraImporter(){
    $this->XMLDataFile = Config::getAttribute("Config/Config_Base",
    "elektra_xml");

    $DB = DB::connect(Config::getAttribute("Config/Config_Base",
    "dsn"));
    if(DB::isError($DB)){die(DB::ErrorMessage($DB));}
    $this->DAO = Loader::buildObject("XML/ElektraDAO", null, $DB);
    }
    /**
    * checks for changes on the elektra xml data.
    * If there are changes the database will be refreshed
    */
    function checkForUpdate(){
    /* if there are changes */
    if($this->_hasElektraFileChanged($this->DAO->getElektraFileHashCode())){
    /* read the file and update the database */
    $this->DAO->updateElektraData($this->_getElektraData());
    } else {
    /* everything is o.k. */
    }
    }
    /**
    * parse the xml file and get the needed data
    * @return array $data
    */
    function _getElektraData(){
    $Parser = &Loader::buildObject("XML/ElektraParser", null,
    array(&$arr));
    if( PEAR::isError($Parser) ){
    die (PEAR::errorMsg($Parser));
    }
    $Parser->setInputFile($this->XMLDataFile);
    if(PEAR::isError($Parser)){ die($Parser->getMessage()); }

    $data = $Parser->getXMLData();

    $data['filehash'] = md5_file($this->XMLDataFile);

    return $data;
    }
    /**
    * checks if the file has changed
    * @return boolean
    */
    function _hasElektraFileChanged($filehash = ""){
    $this->FileHash = md5_file($this->XMLDataFile);

    if($filehash == $this->FileHash){
    return false;
    } else {
    return true;
    }
    }
    }
    ?>
    //--------------- CODE END
    --------------------------------------------------//


    The Parser Class extending the PEAR::XML_Parser


    //--------------- CODE
    ------------------------------------------------------//
    <?php
    require_once "XML/Parser.php";


    class ElektraParser extends XML_Parser
    {
    var $XMLData;
    var $EventNo;
    var $EventName;
    var $LastEventNo;
    var $ActualEventNo;
    var $EventCnt = 0;
    var $ShowCnt = 0;

    function ElektraParser(&$arr){
    $this->XMLData = &$arr;
    $this->XML_Parser("ISO-8859-1", "event", "ISO-8859-1");
    }

    function startHandler($xp, $element, $attribs) {
    $this->Element = $element;
    $this->Attribs = $attribs;
    }

    function endHandler($xp, $element) {
    if ( $element == "EVENT" ){
    /* increase event counter */
    $this->EventCnt++;
    /* set show counter to 0 */
    $this->ShowCnt = 0;
    }
    elseif ( $element == "SHOW" ){
    /* increase show counter for the next show */
    $this->ShowCnt ++;
    }
    $this->Element = "";
    }

    function cdataHandler($xp, $cdata) {
    if($this->Element == "DATE"){
    $this->XMLData['creationdate'] = $cdata;
    }
    elseif($this->Element == "TIME"){
    $this->XMLData['creationtime'] = $cdata;
    }
    /* every event has a sysid the sysid and the eventno make the unique
    eventid */
    elseif($this->Element == "SYSID"){
    $this->XMLData['event'][$this->EventCnt]['sysid'] = $cdata;
    }
    elseif($this->Element == "CLIENTID"){
    $this->XMLData['event'][$this->EventCnt]['clientid'] = $cdata;
    }
    elseif($this->Element == "EVENTNO"){
    $this->XMLData['event'][$this->EventCnt]['eventno'] = $cdata;
    }
    elseif($this->Element == "EVENTNAME"){
    $this->XMLData['event'][$this->EventCnt]['eventname'] = $cdata;
    }
    elseif($this->Element == "NAME"){
    $this->XMLData['event'][$this->EventCnt]['location'] = $cdata;
    }
    elseif($this->Element == "CITY"){
    $this->XMLData['event'][$this->EventCnt]['city'] = $cdata;

    /* eventgroups */
    /* get the position of the first occurence of the city in the
    eventname */
    $pos = strpos($this->XMLData['event'][$this->EventCnt]['eventname'],
    $cdata);
    /* if there´s the city in the name */
    if( $pos ){
    $this->XMLData['event'][$this->EventCnt]['group'] =
    trim(substr($this->XMLData['event'][$this->EventCnt]['eventname'], 0,
    $pos));
    }
    /* otherwise we take the whole eventname as group */
    else {
    $this->XMLData['event'][$this->EventCnt]['group'] =
    trim($this->XMLData['event'][$this->EventCnt]['eventname']);
    }
    }
    /* get the shows */
    elseif($this->Element == "SHOWNO") {
    $this->XMLData['event'][$this->EventCnt]['show'][$this->ShowCnt]['showno']
    = $cdata;
    }
    elseif($this->Element == "SHOWDATE") {
    $this->XMLData['event'][$this->EventCnt]['show'][$this->ShowCnt]['showdate']
    = $cdata;
    }
    elseif($this->Element == "SHOWTIME") {
    $this->XMLData['event'][$this->EventCnt]['show'][$this->ShowCnt]['showtime']
    = $cdata;
    }
    elseif($this->Element == "SHOWPAGE"){
    $this->XMLData['event'][$this->EventCnt]['show'][$this->ShowCnt]['showpage']
    = $this->Attribs['HREF'];
    }
    }
    function defaultHandler($xp, $cdata) {

    }
    function &getXMLData(){
    $p = $this->parse();
    if(PEAR::isError($p)){ die($p->getMessage()); }
    return $this->XMLData;
    }
    }
    ?>
    //--------------- CODE END
    --------------------------------------------------//



    This Problem is real bad because eventIDs have been stripped as well
    and then my SQL Statements didn´t work anymore !!!

    I have no idea what the reason is or even might be =(
    a bug in Expat ?!? ... i can´t really believe
    bad formatted XML ? ... not really !?!
    problems with expats memory management ?!?
    or just my fault? ... where ?

    But it seems that the problem is coupled to the format of the xml
    file.
    If i take out linebreaks or add lines the error occures on other
    places !?!
    But the same structure always produces the same errors ?!?


    My XML skills are not that good so I would be very pleased if you have
    an idea or an advice for me.

    Thanks for your advice.

    With best regards

    Fabian Krüger
    Fabian Kr?ger, Nov 19, 2003
    #1
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Thomas Guettler

    xml.parsers.expat vs. xml.sax

    Thomas Guettler, Apr 27, 2004, in forum: Python
    Replies:
    2
    Views:
    894
    Martijn Faassen
    Apr 27, 2004
  2. Bjoern Hoehrmann

    parsing XML with 'expat'

    Bjoern Hoehrmann, Aug 20, 2007, in forum: XML
    Replies:
    2
    Views:
    607
    Roman Mashak
    Aug 20, 2007
  3. sharan
    Replies:
    1
    Views:
    718
    Pavel Lepin
    Oct 26, 2007
  4. aha
    Replies:
    2
    Views:
    495
    Stefan Behnel
    Jan 23, 2009
  5. kj
    Replies:
    2
    Views:
    283
Loading...

Share This Page