Parsing challenge...

Discussion in 'Ruby' started by Artco News, Oct 7, 2003.

  1. Artco News

    Artco News Guest

    I thought I ask the scripting guru about the following.

    I have a file containing records of data with the following format(first
    column is the label):

    CODE#1^DESCRIPTION^CODE#2^NOTES
    NN-110^an info of NN-001^BRY234^some notes
    NN-111^1st line data
    2nd line data
    3rd line data^BRT345^another notes
    NN-112^description of NN-112^BBC23^multiline
    notes blah
    blah
    blah
    NN-113^info info^MNO12^some notes here

    How do I parse so I can insert them in the database, e.g. MySQL/Access?

    Perhaps there are an advanced scripting language can do this easily.

    Thanks
    Artco News, Oct 7, 2003
    #1
    1. Advertising

  2. Artco News wrote:
    > I thought I ask the scripting guru about the following.
    >
    > I have a file containing records of data with the following format(first
    > column is the label):
    >
    > CODE#1^DESCRIPTION^CODE#2^NOTES
    > NN-110^an info of NN-001^BRY234^some notes
    > NN-111^1st line data
    > 2nd line data
    > 3rd line data^BRT345^another notes
    > NN-112^description of NN-112^BBC23^multiline
    > notes blah
    > blah
    > blah
    > NN-113^info info^MNO12^some notes here
    >
    > How do I parse so I can insert them in the database, e.g. MySQL/Access?
    >
    > Perhaps there are an advanced scripting language can do this easily.


    Regex is your friend...

    <?php
    $fp=fopen('data.txt','r');
    $content=fread($fp,filesize('data.txt'));
    fclose($fp);
    $tmp=time();
    $content= preg_replace('/(\r\n|\r|\n)/',$tmp,$content);
    $pattern='/NN-111\^(.*)\^/U';
    preg_match($pattern,$content,$matches);
    $data=explode($tmp,$matches[1]);
    unset($matches);
    unset($content);
    unset($time);
    echo '<pre>';
    print_r($data);
    echo'</pre>';
    ?>

    This will get you an array with each line of data as a separate element.
    You should be able to see how to extract the notes and such from the
    example. I may be wrong, but it looks like the caret (^) is used as a
    field delimiter as well as the newline.

    --
    Justin Koivisto -
    PHP POSTERS: Please use comp.lang.php for PHP related questions,
    alt.php* groups are not recommended.
    Justin Koivisto, Oct 7, 2003
    #2
    1. Advertising

  3. Artco News

    Ed Morton Guest

    Artco News wrote:
    > I thought I ask the scripting guru about the following.
    >
    > I have a file containing records of data with the following format(first
    > column is the label):
    >
    > CODE#1^DESCRIPTION^CODE#2^NOTES
    > NN-110^an info of NN-001^BRY234^some notes
    > NN-111^1st line data
    > 2nd line data
    > 3rd line data^BRT345^another notes
    > NN-112^description of NN-112^BBC23^multiline
    > notes blah
    > blah
    > blah
    > NN-113^info info^MNO12^some notes here
    >
    > How do I parse so I can insert them in the database, e.g. MySQL/Access?
    >
    > Perhaps there are an advanced scripting language can do this easily.
    >
    > Thanks
    >


    This will parse them to make the records/fields obvious:

    gawk 'BEGIN{pat="NN-"; RS="\n" pat; FS="^"}
    {
    printf("Record %d = {\n",NR)
    $1 = pat $1
    for (i = 1; i <= NF; i++ ) {
    printf("\tField %d = { %s }\n",i,$i)
    }
    printf("}\n")
    }' inputfile

    It'd be trivial to modify the output to whatever format your database
    expects. I used NN- on the start of a line as the record separator,
    hence the unique handling of the first field to replace that NN-. When
    run on your sample input file, this produces:

    Record 1 = {
    Field 1 = { NN-CODE#1 }
    Field 2 = { DESCRIPTION }
    Field 3 = { CODE#2 }
    Field 4 = { NOTES }
    }
    Record 2 = {
    Field 1 = { NN-110 }
    Field 2 = { an info of NN-001 }
    Field 3 = { BRY234 }
    Field 4 = { some notes }
    }
    Record 3 = {
    Field 1 = { NN-111 }
    Field 2 = { 1st line data
    2nd line data
    3rd line data }
    Field 3 = { BRT345 }
    Field 4 = { another notes }
    }
    Record 4 = {
    Field 1 = { NN-112 }
    Field 2 = { description of NN-112 }
    Field 3 = { BBC23 }
    Field 4 = { multiline
    notes blah
    blah
    blah }
    }
    Record 5 = {
    Field 1 = { NN-113 }
    Field 2 = { info info }
    Field 3 = { MNO12 }
    Field 4 = { some notes here
    }
    }

    Regards,

    Ed.
    Ed Morton, Oct 7, 2003
    #3
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Artco News

    Parsing challenge...

    Artco News, Oct 7, 2003, in forum: Perl
    Replies:
    6
    Views:
    463
    Ara.T.Howard
    Oct 8, 2003
  2. Artco News

    Parsing challenge...

    Artco News, Oct 7, 2003, in forum: Perl
    Replies:
    2
    Views:
    409
    Ed Morton
    Oct 7, 2003
  3. Xah Lee

    a little parsing challenge ☺

    Xah Lee, Jul 17, 2011, in forum: Python
    Replies:
    70
    Views:
    1,176
    John O'Hagan
    Jul 25, 2011
  4. Artco News

    Parsing challenge...

    Artco News, Oct 7, 2003, in forum: Ruby
    Replies:
    7
    Views:
    124
    Ara.T.Howard
    Oct 8, 2003
  5. Felipe Espinoza

    Pdf Parsing Challenge

    Felipe Espinoza, May 17, 2011, in forum: Ruby
    Replies:
    7
    Views:
    144
    Johannes Held
    May 19, 2011
Loading...

Share This Page