Parsing challenge...

Discussion in 'Ruby' started by Artco News, Oct 7, 2003.

  1. Artco News

    Artco News Guest

    I thought I ask the scripting guru about the following.

    I have a file containing records of data with the following format(first
    column is the label):

    CODE#1^DESCRIPTION^CODE#2^NOTES
    NN-110^an info of NN-001^BRY234^some notes
    NN-111^1st line data
    2nd line data
    3rd line data^BRT345^another notes
    NN-112^description of NN-112^BBC23^multiline
    notes blah
    blah
    blah
    NN-113^info info^MNO12^some notes here

    How do I parse so I can insert them in the database, e.g. MySQL/Access?

    Perhaps there are an advanced scripting language can do this easily.

    Thanks
    Artco News, Oct 7, 2003
    #1
    1. Advertising

  2. Artco News

    Ara.T.Howard Guest

    On Tue, 7 Oct 2003, Artco News wrote:

    > I thought I ask the scripting guru about the following.
    >
    > I have a file containing records of data with the following format(first
    > column is the label):
    >
    > CODE#1^DESCRIPTION^CODE#2^NOTES
    > NN-110^an info of NN-001^BRY234^some notes
    > NN-111^1st line data
    > 2nd line data
    > 3rd line data^BRT345^another notes
    > NN-112^description of NN-112^BBC23^multiline
    > notes blah
    > blah
    > blah
    > NN-113^info info^MNO12^some notes here
    >
    > How do I parse so I can insert them in the database, e.g. MySQL/Access?
    >
    > Perhaps there are an advanced scripting language can do this easily.


    ruby is one of the more advanced :)

    ~/eg/ruby > cat ./parse.rb

    #!/usr/bin/env ruby

    txt = <<-txt
    CODE#1^DESCRIPTION^CODE#2^NOTES
    NN-110^an info of NN-001^BRY234^some notes
    NN-111^1st line data
    2nd line data
    3rd line data^BRT345^another notes
    NN-112^description of NN-112^BBC23^multiline
    notes blah
    blah
    blah
    NN-113^info info^MNO12^some notes here
    txt


    pat = %r{([^^]+)\^([^^]+)\^([^^]+)\^([^^]+)\n}mox
    tuples = txt.scan pat

    tuples.map{|tuple| p tuple}


    ~/eg/ruby > ./parse.rb

    [" CODE#1", "DESCRIPTION", "CODE#2", "NOTES"]
    [" NN-110", "an info of NN-001", "BRY234", "some notes"]
    [" NN-111", "1st line data\n 2nd line data\n 3rd line data", "BRT345", "another notes"]
    [" NN-112", "description of NN-112", "BBC23", "multiline\n notes blah\n blah\n blah"]
    [" NN-113", "info info", "MNO12", "some notes here"]

    -a
    ====================================
    | Ara Howard
    | NOAA Forecast Systems Laboratory
    | Information and Technology Services
    | Data Systems Group
    | R/FST 325 Broadway
    | Boulder, CO 80305-3328
    | Email:
    | Phone: 303-497-7238
    | Fax: 303-497-7259
    | The difference between art and science is that science is what we understand
    | well enough to explain to a computer. Art is everything else.
    | -- Donald Knuth, "Discover"
    | ~ > /bin/sh -c 'for lang in ruby perl; do $lang -e "print \"\x3a\x2d\x29\x0a\""; done'
    ====================================
    Ara.T.Howard, Oct 7, 2003
    #2
    1. Advertising

  3. "Artco News" <> wrote in message
    news:CTDgb.31522$...
    > I thought I ask the scripting guru about the following.
    >
    > I have a file containing records of data with the following format(first
    > column is the label):
    >
    > CODE#1^DESCRIPTION^CODE#2^NOTES
    > NN-110^an info of NN-001^BRY234^some notes
    > NN-111^1st line data
    > 2nd line data
    > 3rd line data^BRT345^another notes
    > NN-112^description of NN-112^BBC23^multiline
    > notes blah
    > blah
    > blah
    > NN-113^info info^MNO12^some notes here
    >
    > How do I parse so I can insert them in the database, e.g. MySQL/Access?


    <?
    //Assuming we use file to read the file we'll get each line in an array, so
    we'll use $testdata as our sample file
    $testdata = array ();
    $testdata[] = "CODE#1^DESCRIPTION^CODE#2^NOTES\r\n";
    $testdata[] = "NN-110^an info of NN-001^BRY234^some notes\r\n";
    $testdata[] = "NN-111^1st line data\r\n";
    $testdata[] = "2nd line data\r\n";
    $testdata[] = "3rd line data^BRT345^another notes\r\n";
    $testdata[] = "NN-112^description of NN-112^BBC23^multiline\r\n";
    $testdata[] = "notes blah\r\n";
    $testdata[] = "blah\r\n";
    $testdata[] = "blah\r\n";
    $testdata[] = "NN-113^info info^MNO12^some notes here\r\n";

    $dbdata = array ();
    $row = "";
    $cnt = 0;
    foreach ($testdata as $line) {
    $delimiters = preg_match_all ("/\^/", $line, $waste);
    if (($cnt + $delimiters) > 3) {
    $dbdata[] = $row;
    $cnt = $delimiters;
    $row = $line;
    } else {
    $row .= $line;
    $cnt += $delimiters;
    }
    }
    $dbdata[] = $row;
    print_r ($dbdata);
    ?>

    .... produces ...

    Array (
    [0] => CODE#1^DESCRIPTION^CODE#2^NOTES
    [1] => NN-110^an info of NN-001^BRY234^some notes
    [2] => NN-111^1st line data 2nd line data 3rd line data^BRT345^another
    notes
    [3] => NN-112^description of NN-112^BBC23^multiline notes blah blah blah
    [4] => NN-113^info info^MNO12^some notes here
    )

    You can then easily iterate through this array, exploding each line by the ^
    and creating the INSERT INTO table VALUES (); bits of SQL.

    Paulus
    Paulus Magnus, Oct 8, 2003
    #3
  4. this script failed if any of the cell is blank/no-value,
    e.g:

    CODE#1^DESCRIPTION^CODE#2^NOTES
    NN-110^^^some notes
    NN-111^1st line data
    2nd line data
    3rd line data^BRT345^another notes
    NN-112^description of NN-112^BBC23^multiline
    notes blah
    blah
    blah
    NN-113^info info^MNO12^some notes here


    "Ara.T.Howard" <> wrote in message
    news:p...
    > On Tue, 7 Oct 2003, Artco News wrote:
    >
    > > I thought I ask the scripting guru about the following.
    > >
    > > I have a file containing records of data with the following format(first
    > > column is the label):
    > >
    > > CODE#1^DESCRIPTION^CODE#2^NOTES
    > > NN-110^an info of NN-001^BRY234^some notes
    > > NN-111^1st line data
    > > 2nd line data
    > > 3rd line data^BRT345^another notes
    > > NN-112^description of NN-112^BBC23^multiline
    > > notes blah
    > > blah
    > > blah
    > > NN-113^info info^MNO12^some notes here
    > >
    > > How do I parse so I can insert them in the database, e.g. MySQL/Access?
    > >
    > > Perhaps there are an advanced scripting language can do this easily.

    >
    > ruby is one of the more advanced :)
    >
    > ~/eg/ruby > cat ./parse.rb
    >
    > #!/usr/bin/env ruby
    >
    > txt = <<-txt
    > CODE#1^DESCRIPTION^CODE#2^NOTES
    > NN-110^an info of NN-001^BRY234^some notes
    > NN-111^1st line data
    > 2nd line data
    > 3rd line data^BRT345^another notes
    > NN-112^description of NN-112^BBC23^multiline
    > notes blah
    > blah
    > blah
    > NN-113^info info^MNO12^some notes here
    > txt
    >
    >
    > pat = %r{([^^]+)\^([^^]+)\^([^^]+)\^([^^]+)\n}mox
    > tuples = txt.scan pat
    >
    > tuples.map{|tuple| p tuple}
    >
    >
    > ~/eg/ruby > ./parse.rb
    >
    > [" CODE#1", "DESCRIPTION", "CODE#2", "NOTES"]
    > [" NN-110", "an info of NN-001", "BRY234", "some notes"]
    > [" NN-111", "1st line data\n 2nd line data\n 3rd line data",

    "BRT345", "another notes"]
    > [" NN-112", "description of NN-112", "BBC23", "multiline\n notes

    blah\n blah\n blah"]
    > [" NN-113", "info info", "MNO12", "some notes here"]
    >
    > -a
    > ====================================
    > | Ara Howard
    > | NOAA Forecast Systems Laboratory
    > | Information and Technology Services
    > | Data Systems Group
    > | R/FST 325 Broadway
    > | Boulder, CO 80305-3328
    > | Email:
    > | Phone: 303-497-7238
    > | Fax: 303-497-7259
    > | The difference between art and science is that science is what we

    understand
    > | well enough to explain to a computer. Art is everything else.
    > | -- Donald Knuth, "Discover"
    > | ~ > /bin/sh -c 'for lang in ruby perl; do $lang -e "print

    \"\x3a\x2d\x29\x0a\""; done'
    > ====================================
    Useko Netsumi, Oct 8, 2003
    #4
  5. Got it! I just have to replace the (+) sign with (*) for blank or any
    string.

    Next, how do I insert those values into MySQL database, assuming I have
    those table defined. Thanks.

    "Useko Netsumi" <> wrote in message
    news:bm08ec$gd9eb$-berlin.de...
    > this script failed if any of the cell is blank/no-value,
    > e.g:
    >
    > CODE#1^DESCRIPTION^CODE#2^NOTES
    > NN-110^^^some notes
    > NN-111^1st line data
    > 2nd line data
    > 3rd line data^BRT345^another notes
    > NN-112^description of NN-112^BBC23^multiline
    > notes blah
    > blah
    > blah
    > NN-113^info info^MNO12^some notes here
    >
    >
    > "Ara.T.Howard" <> wrote in message
    > news:p...
    > > On Tue, 7 Oct 2003, Artco News wrote:
    > >
    > > > I thought I ask the scripting guru about the following.
    > > >
    > > > I have a file containing records of data with the following

    format(first
    > > > column is the label):
    > > >
    > > > CODE#1^DESCRIPTION^CODE#2^NOTES
    > > > NN-110^an info of NN-001^BRY234^some notes
    > > > NN-111^1st line data
    > > > 2nd line data
    > > > 3rd line data^BRT345^another notes
    > > > NN-112^description of NN-112^BBC23^multiline
    > > > notes blah
    > > > blah
    > > > blah
    > > > NN-113^info info^MNO12^some notes here
    > > >
    > > > How do I parse so I can insert them in the database, e.g.

    MySQL/Access?
    > > >
    > > > Perhaps there are an advanced scripting language can do this easily.

    > >
    > > ruby is one of the more advanced :)
    > >
    > > ~/eg/ruby > cat ./parse.rb
    > >
    > > #!/usr/bin/env ruby
    > >
    > > txt = <<-txt
    > > CODE#1^DESCRIPTION^CODE#2^NOTES
    > > NN-110^an info of NN-001^BRY234^some notes
    > > NN-111^1st line data
    > > 2nd line data
    > > 3rd line data^BRT345^another notes
    > > NN-112^description of NN-112^BBC23^multiline
    > > notes blah
    > > blah
    > > blah
    > > NN-113^info info^MNO12^some notes here
    > > txt
    > >
    > >
    > > pat = %r{([^^]+)\^([^^]+)\^([^^]+)\^([^^]+)\n}mox
    > > tuples = txt.scan pat
    > >
    > > tuples.map{|tuple| p tuple}
    > >
    > >
    > > ~/eg/ruby > ./parse.rb
    > >
    > > [" CODE#1", "DESCRIPTION", "CODE#2", "NOTES"]
    > > [" NN-110", "an info of NN-001", "BRY234", "some notes"]
    > > [" NN-111", "1st line data\n 2nd line data\n 3rd line data",

    > "BRT345", "another notes"]
    > > [" NN-112", "description of NN-112", "BBC23", "multiline\n notes

    > blah\n blah\n blah"]
    > > [" NN-113", "info info", "MNO12", "some notes here"]
    > >
    > > -a
    > > ====================================
    > > | Ara Howard
    > > | NOAA Forecast Systems Laboratory
    > > | Information and Technology Services
    > > | Data Systems Group
    > > | R/FST 325 Broadway
    > > | Boulder, CO 80305-3328
    > > | Email:
    > > | Phone: 303-497-7238
    > > | Fax: 303-497-7259
    > > | The difference between art and science is that science is what we

    > understand
    > > | well enough to explain to a computer. Art is everything else.
    > > | -- Donald Knuth, "Discover"
    > > | ~ > /bin/sh -c 'for lang in ruby perl; do $lang -e "print

    > \"\x3a\x2d\x29\x0a\""; done'
    > > ====================================

    >
    >
    Useko Netsumi, Oct 8, 2003
    #5
  6. Artco News

    Harry Ohlsen Guest

    Useko Netsumi wrote:

    > this script failed if any of the cell is blank/no-value,
    > e.g:


    You may be able to simply change each of the "+" (one or more) in the regex into "*" (zero or more). I gave it a quick test and it seems to work OK, but I didn't test very hard.

    pat = %r{([^^]*)\^([^^]*)\^([^^]*)\^([^^]*)\n}mox

    Good luck.

    Harry O.
    Harry Ohlsen, Oct 8, 2003
    #6
  7. "Artco News" <> schrieb im Newsbeitrag
    news:CTDgb.31522$...
    > I thought I ask the scripting guru about the following.
    >
    > I have a file containing records of data with the following format(first
    > column is the label):
    >
    > CODE#1^DESCRIPTION^CODE#2^NOTES
    > NN-110^an info of NN-001^BRY234^some notes
    > NN-111^1st line data
    > 2nd line data
    > 3rd line data^BRT345^another notes
    > NN-112^description of NN-112^BBC23^multiline
    > notes blah
    > blah
    > blah
    > NN-113^info info^MNO12^some notes here
    >
    > How do I parse so I can insert them in the database, e.g. MySQL/Access?
    >
    > Perhaps there are an advanced scripting language can do this easily.


    Ruby:

    #!/usr/bin/ruby

    def process(rec)
    while rec.size > 4
    dbRec = rec.slice!( 0..3 )
    # db insertion here
    p dbRec
    end
    end

    rec = []

    while ( line = gets )
    line.chomp!
    rec.concat( line.split('^') )
    process rec
    end

    process rec
    Robert Klemme, Oct 8, 2003
    #7
  8. Artco News

    Ara.T.Howard Guest

    On Wed, 8 Oct 2003, Useko Netsumi wrote:

    > Got it! I just have to replace the (+) sign with (*) for blank or any
    > string.
    >
    > Next, how do I insert those values into MySQL database, assuming I have
    > those table defined. Thanks.



    file: parse.rb
    ----CUT----
    #!/usr/bin/env ruby
    require 'mysql'

    # command line args
    host, user, passwd, db, relation = ARGV
    db ||= 'test'
    relation ||= 'test'

    # connect to db
    mysql = Mysql.connect host, user, passwd
    mysql.select_db db

    # parse
    txt = DATA.read
    pat = %r{([^^]*)\^([^^]*)\^([^^]*)\^([^^]*)\n}mox
    tuples = txt.scan pat

    # insert tuples
    sql = "insert into %s values('%s','%s','%s','%s')"
    tuples.each do |tuple|
    begin
    insert = sql % [relation, *tuple]
    mysql.query insert
    rescue Exception => e
    p e
    end
    end

    # show results
    res = mysql.query('select * from %s' % [relation])
    while((row = res.fetch_row))
    p row
    end


    # sample input is embedded below - can be read via DATA object
    __END__
    CODE#1^DESCRIPTION^CODE#2^NOTES
    NN-110^an info of NN-001^BRY234^some notes
    NN-111^1st line data
    2nd line data
    3rd line data^BRT345^another notes
    NN-112^description of NN-112^BBC23^multiline
    notes blah
    blah
    blah
    NN-113^info info^MNO12^some notes here
    ----CUT----



    running it looks like:

    ~/eg/ruby > ./parse.rb
    ["CODE#1", "DESCRIPTION", "CODE#2", "NOTES"]
    ["NN-110", "an info of NN-001", "BRY234", "some notes"]
    ["NN-111", "1st line data\n2nd line data\n3rd line data", "BRT345", "another notes"]
    ["NN-112", "description of NN-112", "BBC23", "multiline\nnotes blah\nblah\nblah"]
    ["NN-113", "info info", "MNO12", "some notes here"]


    i created a database named 'test', and a table named 'test' using 'create
    table test(f0 text,f1 text,f2 text,f3 text)'

    hope that gets you going.

    -a










    >
    > "Useko Netsumi" <> wrote in message
    > news:bm08ec$gd9eb$-berlin.de...
    > > this script failed if any of the cell is blank/no-value,
    > > e.g:
    > >
    > > CODE#1^DESCRIPTION^CODE#2^NOTES
    > > NN-110^^^some notes
    > > NN-111^1st line data
    > > 2nd line data
    > > 3rd line data^BRT345^another notes
    > > NN-112^description of NN-112^BBC23^multiline
    > > notes blah
    > > blah
    > > blah
    > > NN-113^info info^MNO12^some notes here
    > >
    > >
    > > "Ara.T.Howard" <> wrote in message
    > > news:p...
    > > > On Tue, 7 Oct 2003, Artco News wrote:
    > > >
    > > > > I thought I ask the scripting guru about the following.
    > > > >
    > > > > I have a file containing records of data with the following

    > format(first
    > > > > column is the label):
    > > > >
    > > > > CODE#1^DESCRIPTION^CODE#2^NOTES
    > > > > NN-110^an info of NN-001^BRY234^some notes
    > > > > NN-111^1st line data
    > > > > 2nd line data
    > > > > 3rd line data^BRT345^another notes
    > > > > NN-112^description of NN-112^BBC23^multiline
    > > > > notes blah
    > > > > blah
    > > > > blah
    > > > > NN-113^info info^MNO12^some notes here
    > > > >
    > > > > How do I parse so I can insert them in the database, e.g.

    > MySQL/Access?
    > > > >
    > > > > Perhaps there are an advanced scripting language can do this easily.
    > > >
    > > > ruby is one of the more advanced :)
    > > >
    > > > ~/eg/ruby > cat ./parse.rb
    > > >
    > > > #!/usr/bin/env ruby
    > > >
    > > > txt = <<-txt
    > > > CODE#1^DESCRIPTION^CODE#2^NOTES
    > > > NN-110^an info of NN-001^BRY234^some notes
    > > > NN-111^1st line data
    > > > 2nd line data
    > > > 3rd line data^BRT345^another notes
    > > > NN-112^description of NN-112^BBC23^multiline
    > > > notes blah
    > > > blah
    > > > blah
    > > > NN-113^info info^MNO12^some notes here
    > > > txt
    > > >
    > > >
    > > > pat = %r{([^^]+)\^([^^]+)\^([^^]+)\^([^^]+)\n}mox
    > > > tuples = txt.scan pat
    > > >
    > > > tuples.map{|tuple| p tuple}
    > > >
    > > >
    > > > ~/eg/ruby > ./parse.rb
    > > >
    > > > [" CODE#1", "DESCRIPTION", "CODE#2", "NOTES"]
    > > > [" NN-110", "an info of NN-001", "BRY234", "some notes"]
    > > > [" NN-111", "1st line data\n 2nd line data\n 3rd line data",

    > > "BRT345", "another notes"]
    > > > [" NN-112", "description of NN-112", "BBC23", "multiline\n notes

    > > blah\n blah\n blah"]
    > > > [" NN-113", "info info", "MNO12", "some notes here"]
    > > >
    > > > -a
    > > > ====================================
    > > > | Ara Howard
    > > > | NOAA Forecast Systems Laboratory
    > > > | Information and Technology Services
    > > > | Data Systems Group
    > > > | R/FST 325 Broadway
    > > > | Boulder, CO 80305-3328
    > > > | Email:
    > > > | Phone: 303-497-7238
    > > > | Fax: 303-497-7259
    > > > | The difference between art and science is that science is what we

    > > understand
    > > > | well enough to explain to a computer. Art is everything else.
    > > > | -- Donald Knuth, "Discover"
    > > > | ~ > /bin/sh -c 'for lang in ruby perl; do $lang -e "print

    > > \"\x3a\x2d\x29\x0a\""; done'
    > > > ====================================

    > >
    > >

    >
    >
    >


    ====================================
    | Ara Howard
    | NOAA Forecast Systems Laboratory
    | Information and Technology Services
    | Data Systems Group
    | R/FST 325 Broadway
    | Boulder, CO 80305-3328
    | Email:
    | Phone: 303-497-7238
    | Fax: 303-497-7259
    | The difference between art and science is that science is what we understand
    | well enough to explain to a computer. Art is everything else.
    | -- Donald Knuth, "Discover"
    | ~ > /bin/sh -c 'for lang in ruby perl; do $lang -e "print \"\x3a\x2d\x29\x0a\""; done'
    ====================================
    Ara.T.Howard, Oct 8, 2003
    #8
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Artco News

    Parsing challenge...

    Artco News, Oct 7, 2003, in forum: Perl
    Replies:
    6
    Views:
    478
    Ara.T.Howard
    Oct 8, 2003
  2. Artco News

    Parsing challenge...

    Artco News, Oct 7, 2003, in forum: Perl
    Replies:
    2
    Views:
    418
    Ed Morton
    Oct 7, 2003
  3. Xah Lee

    a little parsing challenge ☺

    Xah Lee, Jul 17, 2011, in forum: Python
    Replies:
    70
    Views:
    1,224
    John O'Hagan
    Jul 25, 2011
  4. Artco News

    Parsing challenge...

    Artco News, Oct 7, 2003, in forum: Ruby
    Replies:
    2
    Views:
    97
    Ed Morton
    Oct 7, 2003
  5. Felipe Espinoza

    Pdf Parsing Challenge

    Felipe Espinoza, May 17, 2011, in forum: Ruby
    Replies:
    7
    Views:
    153
    Johannes Held
    May 19, 2011
Loading...

Share This Page