How to process 1000 files xml to 1 file?

Discussion in 'XML' started by AMDx64BT, Mar 7, 2011.

  1. AMDx64BT

    AMDx64BT Guest

    -- DESCRIPTION -----------------------

    I would like to write a script with awk or vim to process Lab Blood Tests in xml format to

    import with SPSS.

    Each blood test is an xml file.

    If I have (for example):
    1000 Lab Blood Tests (1000 xml files)
    250 patients
    4 blood tests/patient

    Each Blood Test file has the name format: rapport_33405954.xml

    Each blood test has a variable number of components but I am interested to analyze only 3

    elements: K, Na and Ca. (These elements are not included in all the Blood Tests.)


    -- PATIENT -----------------------

    -- SOURCE:

    <Patient>
    <lbpa_Npa>1234</lbpa_Npa>
    <lbpa_Nai>02-Oct-1923 00:00:00</lbpa_Nai>
    <Entree>15-Oct-1582 01:00:00</Entree>
    <Pid>0</Pid>
    <Ncas>0</Ncas>
    <pre10 />
    <lbpa_Pre>Peter</lbpa_Pre>
    <lbpa_Num_Npat>1234567</lbpa_Num_Npat>
    <lbrq_Nom1 />
    <lbpa_Adr2>Paris</lbpa_Adr2>
    <lbrq_Nom2 />
    <lbpa_Sexe>M</lbpa_Sexe>
    <nom10 />
    <lbrq_Rid>0</lbrq_Rid>
    <Actif />
    <lbpa_Adr />
    <lbpa_Nom>Smith</lbpa_Nom>
    <Adm />
    </Patient>

    -- RESULT:

    (first_name second_name, date_born)
    lbpa_Nom lbpa_Pre, lbpa_Nai
    Smith Peter, 1923.10.02


    -- DATE TAKEN BLOOD -----------------------

    --SOURCE:

    <Demande>
    <Entree>15-Oct-1582 01:00:00</Entree>
    <lbde_Rid>12345</lbde_Rid>
    <lbde_Nlab>12345</lbde_Nlab>
    <Sortie>15-Oct-1582 01:00:00</Sortie>
    <NarunaFile />
    <Ncas>0</Ncas>
    <Etabl />
    <lbde_Num_Npat>12345</lbde_Num_Npat>
    <Naruna />
    <Date_Mod>01-Jan-1900 00:00:00</Date_Mod>
    <Taille>0</Taille>
    <lbde_pid>12345/111</lbde_pid>
    <TCollection>0</TCollection>
    <Semgr>0</Semgr>
    <lbrq_nom1 />
    <lbde_Dtprv>02-Mar-2011 06:00:00</lbde_Dtprv>
    <Pathologique>FALSE</Pathologique>
    <lbrq_nom2 />
    <Bacterio>FALSE</Bacterio>
    <Volume>0</Volume>
    <Type_www />
    <Poids>0</Poids>
    <lbde_Dtdem>02-Mar-2011 07:18:32</lbde_Dtdem>
    <PasVue>FALSE</PasVue>
    <par />
    <Domaine />
    </Demande>

    -- RESULT:

    (Date_taken_blood)
    lbde_Dtprv
    2011.03.02


    -- ELEMENT -----------------------

    -- SOURCE:

    <Analyse>
    <OrdreImpression>12345</OrdreImpression>
    <CodeMateriel />
    <TypeLigne>0</TypeLigne>
    <Formulaire>21</Formulaire>
    <Norme>136 - 145 mmol/l</Norme>
    <Code>2039</Code>
    <Commentaire />
    <Anterieur />
    <TypeResultat>0</TypeResultat>
    <Resultat>136</Resultat>
    <Unite>mmol/l</Unite>
    <Remarque />
    <Clos>O</Clos>
    <Libelle>Sodium</Libelle>
    </Analyse>

    -- RESULT:

    (Element number)
    Libelle Resultat
    Sodium 136


    -- SORT ELEMENT BY DATE -----------------------

    Sodium 05.01.2011 --> Na1
    Sodium 08.01.2011 --> Na3
    Sodium 06.01.2011 --> Na2


    -- FINAL RESULT -----------------------

    From 1000 files I want to obtain a file with this format. To be able to import it with

    SPSS:

    Na1 Na2 K1 K2
    Smith Peter 19231002 136 133 4 3.5
    Gates Edward 19801204 145 166 3.1 3.4

    (In this case the date of Na1 of Smith and Gates, could be different, but the variable Na1

    is the same)


    Any advice is appreciated
    AMDx64BT, Mar 7, 2011
    #1
    1. Advertising

  2. Well, that is the kind of problem which can be solved by the canonical
    Desperate Perl Hacker (DPH) by writing a specialized parser. I don't
    know enough about either awk or vim to know whether they would be able
    to do this for you or not.

    Personally, I'd do it in Java using an off-the-shelf XML parser. The DPH
    code might actually run faster, but would be less robust and less
    maintainable.
    Joe Kesselman, Mar 8, 2011
    #2
    1. Advertising

  3. AMDx64BT

    Peter Flynn Guest

    On 07/03/11 19:06, AMDx64BT wrote:
    > -- DESCRIPTION -----------------------
    >
    > I would like to write a script with awk or vim to process Lab Blood Tests in xml format to
    >
    > import with SPSS.
    >
    > Each blood test is an xml file.
    >
    > If I have (for example):
    > 1000 Lab Blood Tests (1000 xml files)
    > 250 patients
    > 4 blood tests/patient
    >
    > Each Blood Test file has the name format: rapport_33405954.xml
    >
    > Each blood test has a variable number of components but I am interested to analyze only 3
    >
    > elements: K, Na and Ca. (These elements are not included in all the Blood Tests.)
    >
    >
    > -- PATIENT -----------------------
    >
    > -- SOURCE:
    >
    > <Patient>
    > <lbpa_Npa>1234</lbpa_Npa>
    > <lbpa_Nai>02-Oct-1923 00:00:00</lbpa_Nai>
    > <Entree>15-Oct-1582 01:00:00</Entree>
    > <Pid>0</Pid>
    > <Ncas>0</Ncas>
    > <pre10 />
    > <lbpa_Pre>Peter</lbpa_Pre>
    > <lbpa_Num_Npat>1234567</lbpa_Num_Npat>
    > <lbrq_Nom1 />
    > <lbpa_Adr2>Paris</lbpa_Adr2>
    > <lbrq_Nom2 />
    > <lbpa_Sexe>M</lbpa_Sexe>
    > <nom10 />
    > <lbrq_Rid>0</lbrq_Rid>
    > <Actif />
    > <lbpa_Adr />
    > <lbpa_Nom>Smith</lbpa_Nom>
    > <Adm />
    > </Patient>
    >
    > -- RESULT:
    >
    > (first_name second_name, date_born)
    > lbpa_Nom lbpa_Pre, lbpa_Nai
    > Smith Peter, 1923.10.02
    >
    >
    > -- DATE TAKEN BLOOD -----------------------
    >
    > --SOURCE:
    >
    > <Demande>
    > <Entree>15-Oct-1582 01:00:00</Entree>
    > <lbde_Rid>12345</lbde_Rid>
    > <lbde_Nlab>12345</lbde_Nlab>
    > <Sortie>15-Oct-1582 01:00:00</Sortie>
    > <NarunaFile />
    > <Ncas>0</Ncas>
    > <Etabl />
    > <lbde_Num_Npat>12345</lbde_Num_Npat>
    > <Naruna />
    > <Date_Mod>01-Jan-1900 00:00:00</Date_Mod>
    > <Taille>0</Taille>
    > <lbde_pid>12345/111</lbde_pid>
    > <TCollection>0</TCollection>
    > <Semgr>0</Semgr>
    > <lbrq_nom1 />
    > <lbde_Dtprv>02-Mar-2011 06:00:00</lbde_Dtprv>
    > <Pathologique>FALSE</Pathologique>
    > <lbrq_nom2 />
    > <Bacterio>FALSE</Bacterio>
    > <Volume>0</Volume>
    > <Type_www />
    > <Poids>0</Poids>
    > <lbde_Dtdem>02-Mar-2011 07:18:32</lbde_Dtdem>
    > <PasVue>FALSE</PasVue>
    > <par />
    > <Domaine />
    > </Demande>
    >
    > -- RESULT:
    >
    > (Date_taken_blood)
    > lbde_Dtprv
    > 2011.03.02
    >
    >
    > -- ELEMENT -----------------------
    >
    > -- SOURCE:
    >
    > <Analyse>
    > <OrdreImpression>12345</OrdreImpression>
    > <CodeMateriel />
    > <TypeLigne>0</TypeLigne>
    > <Formulaire>21</Formulaire>
    > <Norme>136 - 145 mmol/l</Norme>
    > <Code>2039</Code>
    > <Commentaire />
    > <Anterieur />
    > <TypeResultat>0</TypeResultat>
    > <Resultat>136</Resultat>
    > <Unite>mmol/l</Unite>
    > <Remarque />
    > <Clos>O</Clos>
    > <Libelle>Sodium</Libelle>
    > </Analyse>
    >
    > -- RESULT:
    >
    > (Element number)
    > Libelle Resultat
    > Sodium 136
    >
    >
    > -- SORT ELEMENT BY DATE -----------------------
    >
    > Sodium 05.01.2011 --> Na1
    > Sodium 08.01.2011 --> Na3
    > Sodium 06.01.2011 --> Na2
    >
    >
    > -- FINAL RESULT -----------------------
    >
    > From 1000 files I want to obtain a file with this format. To be able to import it with
    >
    > SPSS:
    >
    > Na1 Na2 K1 K2
    > Smith Peter 19231002 136 133 4 3.5
    > Gates Edward 19801204 145 166 3.1 3.4
    >
    > (In this case the date of Na1 of Smith and Gates, could be different, but the variable Na1
    >
    > is the same)
    >
    >
    > Any advice is appreciated


    1000 files, but how many Patient, how many Demande, how many Analyse?

    Concatenate them into a single patient file, a single demande file and a
    single analyse file, then write XSLT2 to process them with lookups using
    document(). 1000 is not very many.

    Or use lxprintf to extract the data to three files, and then use
    standard tools like join(1) to create a single master file.

    ///Peter
    Peter Flynn, Mar 8, 2011
    #3
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Kevin Flood
    Replies:
    0
    Views:
    1,010
    Kevin Flood
    Sep 8, 2004
  2. Kevin Flood
    Replies:
    1
    Views:
    2,717
    Kevin Flood
    Sep 13, 2004
  3. niraj
    Replies:
    3
    Views:
    556
    Oliver Wong
    Mar 22, 2007
  4. pozz
    Replies:
    27
    Views:
    718
    Seebs
    Mar 4, 2011
  5. AMDx64BT
    Replies:
    0
    Views:
    861
    AMDx64BT
    Mar 8, 2011
Loading...

Share This Page