Text parser / reformatting

M

Marc Hoeppner

Hi everyone,

I expect this is a rather trivial problem, but I just started using ruby
and am a bit stuck right now.
Here is what I want to do:

I have a text file, that contains information in the following format:

KOG0003
At2g36170
At3g52590
CE15495
7295730
KOG0004
Hs20476120
YIL148w
YKR094c
SPAC11G7.04

Now, this has to go into a relational database. But right now this is
not really a table. The desired output would look something like this:

KOG0003 At2g36170
KOG0003 At3g52590
KOG0003 CE15495
KOG0003 7295730
KOG0004 Hs20476120
KOG0004 YIL148w
KOG0004 YKR094c

Well, you get the picture. What I tried to do is to read the text file,
than look for lines that start with a blank and replace that blank with
the first word of the previous line, given that this line does in fact
starts with a word (could also be selected by using KOG[0-9]*). I
thought of storing the KOG[0-9] in a variable, but overall I cant make
it work and have no real idea how to solve this. Any help would be
greatly appreciated. Guess for an experienced user this is a three-liner
_.

Cheers,

Marc
 
R

Robert Klemme

2007/7/9 said:
Hi everyone,

I expect this is a rather trivial problem, but I just started using ruby
and am a bit stuck right now.
Here is what I want to do:

I have a text file, that contains information in the following format:

KOG0003
At2g36170
At3g52590
CE15495
7295730
KOG0004
Hs20476120
YIL148w
YKR094c
SPAC11G7.04

Now, this has to go into a relational database. But right now this is
not really a table. The desired output would look something like this:

KOG0003 At2g36170
KOG0003 At3g52590
KOG0003 CE15495
KOG0003 7295730
KOG0004 Hs20476120
KOG0004 YIL148w
KOG0004 YKR094c

Well, you get the picture. What I tried to do is to read the text file,
than look for lines that start with a blank and replace that blank with
the first word of the previous line, given that this line does in fact
starts with a word (could also be selected by using KOG[0-9]*). I
thought of storing the KOG[0-9] in a variable, but overall I cant make
it work and have no real idea how to solve this. Any help would be
greatly appreciated. Guess for an experienced user this is a three-liner

Hm... Maybe something like this:

key = nil
ARGF.each do |line|
line.chomp!
case line
when /^(\S+)/
key = line.strip
when /^\s+(\S+)/
print key, " ", $1, "\n" if key
else
# ignore
end
end

Kind regards

robert
 
A

Alex Gutteridge

Hi everyone,

I expect this is a rather trivial problem, but I just started using
ruby
and am a bit stuck right now.
Here is what I want to do:

I have a text file, that contains information in the following format:

KOG0003
At2g36170
At3g52590
CE15495
7295730
KOG0004
Hs20476120
YIL148w
YKR094c
SPAC11G7.04

Now, this has to go into a relational database. But right now this is
not really a table. The desired output would look something like this:

KOG0003 At2g36170
KOG0003 At3g52590
KOG0003 CE15495
KOG0003 7295730
KOG0004 Hs20476120
KOG0004 YIL148w
KOG0004 YKR094c

Well, you get the picture. What I tried to do is to read the text
file,
than look for lines that start with a blank and replace that blank
with
the first word of the previous line, given that this line does in fact
starts with a word (could also be selected by using KOG[0-9]*). I
thought of storing the KOG[0-9] in a variable, but overall I cant make
it work and have no real idea how to solve this. Any help would be
greatly appreciated. Guess for an experienced user this is a three-
liner
._.

Cheers,

Marc

Not a very fancy solution, but it seems to work for the data you
posted. Also uses the pattern you suggested, storing the KOG*
identifier in a variable (field1):

[alexg@powerbook]/Users/alexg/Desktop(7): cat test.rb
field1 = nil
IO.foreach(ARGV[0]) do |l|
if l.match(/^(\S+)/)
field1 = $1
else
puts "#{field1} #{l.strip}"
end
end
[alexg@powerbook]/Users/alexg/Desktop(8): cat data.dat
KOG0003
At2g36170
At3g52590
CE15495
7295730
KOG0004
Hs20476120
YIL148w
YKR094c
SPAC11G7.04
[alexg@powerbook]/Users/alexg/Desktop(9): ruby test.rb data.dat
KOG0003 At2g36170
KOG0003 At3g52590
KOG0003 CE15495
KOG0003 7295730
KOG0004 Hs20476120
KOG0004 YIL148w
KOG0004 YKR094c
KOG0004 SPAC11G7.04

Alex Gutteridge

Bioinformatics Center
Kyoto University
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,733
Messages
2,569,440
Members
44,829
Latest member
PIXThurman

Latest Threads

Top