Parsing challenge...

A

Artco News

I thought I ask the scripting guru about the following.

I have a file containing records of data with the following format(first
column is the label):

CODE#1^DESCRIPTION^CODE#2^NOTES
NN-110^an info of NN-001^BRY234^some notes
NN-111^1st line data
2nd line data
3rd line data^BRT345^another notes
NN-112^description of NN-112^BBC23^multiline
notes blah
blah
blah
NN-113^info info^MNO12^some notes here

How do I parse so I can insert them in the database, e.g. MySQL/Access?

Perhaps there are an advanced scripting language can do this easily.

Thanks
 
A

Ara.T.Howard

I thought I ask the scripting guru about the following.

I have a file containing records of data with the following format(first
column is the label):

CODE#1^DESCRIPTION^CODE#2^NOTES
NN-110^an info of NN-001^BRY234^some notes
NN-111^1st line data
2nd line data
3rd line data^BRT345^another notes
NN-112^description of NN-112^BBC23^multiline
notes blah
blah
blah
NN-113^info info^MNO12^some notes here

How do I parse so I can insert them in the database, e.g. MySQL/Access?

Perhaps there are an advanced scripting language can do this easily.

ruby is one of the more advanced :)

~/eg/ruby > cat ./parse.rb

#!/usr/bin/env ruby

txt = <<-txt
CODE#1^DESCRIPTION^CODE#2^NOTES
NN-110^an info of NN-001^BRY234^some notes
NN-111^1st line data
2nd line data
3rd line data^BRT345^another notes
NN-112^description of NN-112^BBC23^multiline
notes blah
blah
blah
NN-113^info info^MNO12^some notes here
txt


pat = %r{([^^]+)\^([^^]+)\^([^^]+)\^([^^]+)\n}mox
tuples = txt.scan pat

tuples.map{|tuple| p tuple}


~/eg/ruby > ./parse.rb

[" CODE#1", "DESCRIPTION", "CODE#2", "NOTES"]
[" NN-110", "an info of NN-001", "BRY234", "some notes"]
[" NN-111", "1st line data\n 2nd line data\n 3rd line data", "BRT345", "another notes"]
[" NN-112", "description of NN-112", "BBC23", "multiline\n notes blah\n blah\n blah"]
[" NN-113", "info info", "MNO12", "some notes here"]

-a
====================================
| Ara Howard
| NOAA Forecast Systems Laboratory
| Information and Technology Services
| Data Systems Group
| R/FST 325 Broadway
| Boulder, CO 80305-3328
| Email: (e-mail address removed)
| Phone: 303-497-7238
| Fax: 303-497-7259
| The difference between art and science is that science is what we understand
| well enough to explain to a computer. Art is everything else.
| -- Donald Knuth, "Discover"
| ~ > /bin/sh -c 'for lang in ruby perl; do $lang -e "print \"\x3a\x2d\x29\x0a\""; done'
====================================
 
P

Paulus Magnus

Artco News said:
I thought I ask the scripting guru about the following.

I have a file containing records of data with the following format(first
column is the label):

CODE#1^DESCRIPTION^CODE#2^NOTES
NN-110^an info of NN-001^BRY234^some notes
NN-111^1st line data
2nd line data
3rd line data^BRT345^another notes
NN-112^description of NN-112^BBC23^multiline
notes blah
blah
blah
NN-113^info info^MNO12^some notes here

How do I parse so I can insert them in the database, e.g. MySQL/Access?

<?
//Assuming we use file to read the file we'll get each line in an array, so
we'll use $testdata as our sample file
$testdata = array ();
$testdata[] = "CODE#1^DESCRIPTION^CODE#2^NOTES\r\n";
$testdata[] = "NN-110^an info of NN-001^BRY234^some notes\r\n";
$testdata[] = "NN-111^1st line data\r\n";
$testdata[] = "2nd line data\r\n";
$testdata[] = "3rd line data^BRT345^another notes\r\n";
$testdata[] = "NN-112^description of NN-112^BBC23^multiline\r\n";
$testdata[] = "notes blah\r\n";
$testdata[] = "blah\r\n";
$testdata[] = "blah\r\n";
$testdata[] = "NN-113^info info^MNO12^some notes here\r\n";

$dbdata = array ();
$row = "";
$cnt = 0;
foreach ($testdata as $line) {
$delimiters = preg_match_all ("/\^/", $line, $waste);
if (($cnt + $delimiters) > 3) {
$dbdata[] = $row;
$cnt = $delimiters;
$row = $line;
} else {
$row .= $line;
$cnt += $delimiters;
}
}
$dbdata[] = $row;
print_r ($dbdata);
?>

.... produces ...

Array (
[0] => CODE#1^DESCRIPTION^CODE#2^NOTES
[1] => NN-110^an info of NN-001^BRY234^some notes
[2] => NN-111^1st line data 2nd line data 3rd line data^BRT345^another
notes
[3] => NN-112^description of NN-112^BBC23^multiline notes blah blah blah
[4] => NN-113^info info^MNO12^some notes here
)

You can then easily iterate through this array, exploding each line by the ^
and creating the INSERT INTO table VALUES (); bits of SQL.

Paulus
 
U

Useko Netsumi

this script failed if any of the cell is blank/no-value,
e.g:

CODE#1^DESCRIPTION^CODE#2^NOTES
NN-110^^^some notes
NN-111^1st line data
2nd line data
3rd line data^BRT345^another notes
NN-112^description of NN-112^BBC23^multiline
notes blah
blah
blah
NN-113^info info^MNO12^some notes here


Ara.T.Howard said:
I thought I ask the scripting guru about the following.

I have a file containing records of data with the following format(first
column is the label):

CODE#1^DESCRIPTION^CODE#2^NOTES
NN-110^an info of NN-001^BRY234^some notes
NN-111^1st line data
2nd line data
3rd line data^BRT345^another notes
NN-112^description of NN-112^BBC23^multiline
notes blah
blah
blah
NN-113^info info^MNO12^some notes here

How do I parse so I can insert them in the database, e.g. MySQL/Access?

Perhaps there are an advanced scripting language can do this easily.

ruby is one of the more advanced :)

~/eg/ruby > cat ./parse.rb

#!/usr/bin/env ruby

txt = <<-txt
CODE#1^DESCRIPTION^CODE#2^NOTES
NN-110^an info of NN-001^BRY234^some notes
NN-111^1st line data
2nd line data
3rd line data^BRT345^another notes
NN-112^description of NN-112^BBC23^multiline
notes blah
blah
blah
NN-113^info info^MNO12^some notes here
txt


pat = %r{([^^]+)\^([^^]+)\^([^^]+)\^([^^]+)\n}mox
tuples = txt.scan pat

tuples.map{|tuple| p tuple}


~/eg/ruby > ./parse.rb

[" CODE#1", "DESCRIPTION", "CODE#2", "NOTES"]
[" NN-110", "an info of NN-001", "BRY234", "some notes"]
[" NN-111", "1st line data\n 2nd line data\n 3rd line data", "BRT345", "another notes"]
[" NN-112", "description of NN-112", "BBC23", "multiline\n notes blah\n blah\n blah"]
[" NN-113", "info info", "MNO12", "some notes here"]

-a
====================================
| Ara Howard
| NOAA Forecast Systems Laboratory
| Information and Technology Services
| Data Systems Group
| R/FST 325 Broadway
| Boulder, CO 80305-3328
| Email: (e-mail address removed)
| Phone: 303-497-7238
| Fax: 303-497-7259
| The difference between art and science is that science is what we understand
| well enough to explain to a computer. Art is everything else.
| -- Donald Knuth, "Discover"
| ~ > /bin/sh -c 'for lang in ruby perl; do $lang -e "print \"\x3a\x2d\x29\x0a\""; done'
====================================
 
U

Useko Netsumi

Got it! I just have to replace the (+) sign with (*) for blank or any
string.

Next, how do I insert those values into MySQL database, assuming I have
those table defined. Thanks.

Useko Netsumi said:
this script failed if any of the cell is blank/no-value,
e.g:

CODE#1^DESCRIPTION^CODE#2^NOTES
NN-110^^^some notes
NN-111^1st line data
2nd line data
3rd line data^BRT345^another notes
NN-112^description of NN-112^BBC23^multiline
notes blah
blah
blah
NN-113^info info^MNO12^some notes here


Ara.T.Howard said:
I thought I ask the scripting guru about the following.

I have a file containing records of data with the following format(first
column is the label):

CODE#1^DESCRIPTION^CODE#2^NOTES
NN-110^an info of NN-001^BRY234^some notes
NN-111^1st line data
2nd line data
3rd line data^BRT345^another notes
NN-112^description of NN-112^BBC23^multiline
notes blah
blah
blah
NN-113^info info^MNO12^some notes here

How do I parse so I can insert them in the database, e.g. MySQL/Access?

Perhaps there are an advanced scripting language can do this easily.

ruby is one of the more advanced :)

~/eg/ruby > cat ./parse.rb

#!/usr/bin/env ruby

txt = <<-txt
CODE#1^DESCRIPTION^CODE#2^NOTES
NN-110^an info of NN-001^BRY234^some notes
NN-111^1st line data
2nd line data
3rd line data^BRT345^another notes
NN-112^description of NN-112^BBC23^multiline
notes blah
blah
blah
NN-113^info info^MNO12^some notes here
txt


pat = %r{([^^]+)\^([^^]+)\^([^^]+)\^([^^]+)\n}mox
tuples = txt.scan pat

tuples.map{|tuple| p tuple}


~/eg/ruby > ./parse.rb

[" CODE#1", "DESCRIPTION", "CODE#2", "NOTES"]
[" NN-110", "an info of NN-001", "BRY234", "some notes"]
[" NN-111", "1st line data\n 2nd line data\n 3rd line data", "BRT345", "another notes"]
[" NN-112", "description of NN-112", "BBC23", "multiline\n notes blah\n blah\n blah"]
[" NN-113", "info info", "MNO12", "some notes here"]

-a
====================================
| Ara Howard
| NOAA Forecast Systems Laboratory
| Information and Technology Services
| Data Systems Group
| R/FST 325 Broadway
| Boulder, CO 80305-3328
| Email: (e-mail address removed)
| Phone: 303-497-7238
| Fax: 303-497-7259
| The difference between art and science is that science is what we understand
| well enough to explain to a computer. Art is everything else.
| -- Donald Knuth, "Discover"
| ~ > /bin/sh -c 'for lang in ruby perl; do $lang -e "print \"\x3a\x2d\x29\x0a\""; done'
====================================
 
R

Robert Klemme

Artco News said:
I thought I ask the scripting guru about the following.

I have a file containing records of data with the following format(first
column is the label):

CODE#1^DESCRIPTION^CODE#2^NOTES
NN-110^an info of NN-001^BRY234^some notes
NN-111^1st line data
2nd line data
3rd line data^BRT345^another notes
NN-112^description of NN-112^BBC23^multiline
notes blah
blah
blah
NN-113^info info^MNO12^some notes here

How do I parse so I can insert them in the database, e.g. MySQL/Access?

Perhaps there are an advanced scripting language can do this easily.

Ruby:

#!/usr/bin/ruby

def process(rec)
while rec.size > 4
dbRec = rec.slice!( 0..3 )
# db insertion here
p dbRec
end
end

rec = []

while ( line = gets )
line.chomp!
rec.concat( line.split('^') )
process rec
end

process rec
 
A

Ara.T.Howard

Got it! I just have to replace the (+) sign with (*) for blank or any
string.

Next, how do I insert those values into MySQL database, assuming I have
those table defined. Thanks.


file: parse.rb
----CUT----
#!/usr/bin/env ruby
require 'mysql'

# command line args
host, user, passwd, db, relation = ARGV
db ||= 'test'
relation ||= 'test'

# connect to db
mysql = Mysql.connect host, user, passwd
mysql.select_db db

# parse
txt = DATA.read
pat = %r{([^^]*)\^([^^]*)\^([^^]*)\^([^^]*)\n}mox
tuples = txt.scan pat

# insert tuples
sql = "insert into %s values('%s','%s','%s','%s')"
tuples.each do |tuple|
begin
insert = sql % [relation, *tuple]
mysql.query insert
rescue Exception => e
p e
end
end

# show results
res = mysql.query('select * from %s' % [relation])
while((row = res.fetch_row))
p row
end


# sample input is embedded below - can be read via DATA object
__END__
CODE#1^DESCRIPTION^CODE#2^NOTES
NN-110^an info of NN-001^BRY234^some notes
NN-111^1st line data
2nd line data
3rd line data^BRT345^another notes
NN-112^description of NN-112^BBC23^multiline
notes blah
blah
blah
NN-113^info info^MNO12^some notes here
----CUT----



running it looks like:

~/eg/ruby > ./parse.rb
["CODE#1", "DESCRIPTION", "CODE#2", "NOTES"]
["NN-110", "an info of NN-001", "BRY234", "some notes"]
["NN-111", "1st line data\n2nd line data\n3rd line data", "BRT345", "another notes"]
["NN-112", "description of NN-112", "BBC23", "multiline\nnotes blah\nblah\nblah"]
["NN-113", "info info", "MNO12", "some notes here"]


i created a database named 'test', and a table named 'test' using 'create
table test(f0 text,f1 text,f2 text,f3 text)'

hope that gets you going.

-a









Useko Netsumi said:
this script failed if any of the cell is blank/no-value,
e.g:

CODE#1^DESCRIPTION^CODE#2^NOTES
NN-110^^^some notes
NN-111^1st line data
2nd line data
3rd line data^BRT345^another notes
NN-112^description of NN-112^BBC23^multiline
notes blah
blah
blah
NN-113^info info^MNO12^some notes here


Ara.T.Howard said:
On Tue, 7 Oct 2003, Artco News wrote:

I thought I ask the scripting guru about the following.

I have a file containing records of data with the following format(first
column is the label):

CODE#1^DESCRIPTION^CODE#2^NOTES
NN-110^an info of NN-001^BRY234^some notes
NN-111^1st line data
2nd line data
3rd line data^BRT345^another notes
NN-112^description of NN-112^BBC23^multiline
notes blah
blah
blah
NN-113^info info^MNO12^some notes here

How do I parse so I can insert them in the database, e.g. MySQL/Access?

Perhaps there are an advanced scripting language can do this easily.

ruby is one of the more advanced :)

~/eg/ruby > cat ./parse.rb

#!/usr/bin/env ruby

txt = <<-txt
CODE#1^DESCRIPTION^CODE#2^NOTES
NN-110^an info of NN-001^BRY234^some notes
NN-111^1st line data
2nd line data
3rd line data^BRT345^another notes
NN-112^description of NN-112^BBC23^multiline
notes blah
blah
blah
NN-113^info info^MNO12^some notes here
txt


pat = %r{([^^]+)\^([^^]+)\^([^^]+)\^([^^]+)\n}mox
tuples = txt.scan pat

tuples.map{|tuple| p tuple}


~/eg/ruby > ./parse.rb

[" CODE#1", "DESCRIPTION", "CODE#2", "NOTES"]
[" NN-110", "an info of NN-001", "BRY234", "some notes"]
[" NN-111", "1st line data\n 2nd line data\n 3rd line data", "BRT345", "another notes"]
[" NN-112", "description of NN-112", "BBC23", "multiline\n notes blah\n blah\n blah"]
[" NN-113", "info info", "MNO12", "some notes here"]

-a
====================================
| Ara Howard
| NOAA Forecast Systems Laboratory
| Information and Technology Services
| Data Systems Group
| R/FST 325 Broadway
| Boulder, CO 80305-3328
| Email: (e-mail address removed)
| Phone: 303-497-7238
| Fax: 303-497-7259
| The difference between art and science is that science is what we understand
| well enough to explain to a computer. Art is everything else.
| -- Donald Knuth, "Discover"
| ~ > /bin/sh -c 'for lang in ruby perl; do $lang -e "print \"\x3a\x2d\x29\x0a\""; done'
====================================

====================================
| Ara Howard
| NOAA Forecast Systems Laboratory
| Information and Technology Services
| Data Systems Group
| R/FST 325 Broadway
| Boulder, CO 80305-3328
| Email: (e-mail address removed)
| Phone: 303-497-7238
| Fax: 303-497-7259
| The difference between art and science is that science is what we understand
| well enough to explain to a computer. Art is everything else.
| -- Donald Knuth, "Discover"
| ~ > /bin/sh -c 'for lang in ruby perl; do $lang -e "print \"\x3a\x2d\x29\x0a\""; done'
====================================
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,579
Members
45,053
Latest member
BrodieSola

Latest Threads

Top