Text parser / reformatting

Discussion in 'Ruby' started by Marc Hoeppner, Jul 9, 2007.

  1. Hi everyone,

    I expect this is a rather trivial problem, but I just started using ruby
    and am a bit stuck right now.
    Here is what I want to do:

    I have a text file, that contains information in the following format:

    KOG0003
    At2g36170
    At3g52590
    CE15495
    7295730
    KOG0004
    Hs20476120
    YIL148w
    YKR094c
    SPAC11G7.04

    Now, this has to go into a relational database. But right now this is
    not really a table. The desired output would look something like this:

    KOG0003 At2g36170
    KOG0003 At3g52590
    KOG0003 CE15495
    KOG0003 7295730
    KOG0004 Hs20476120
    KOG0004 YIL148w
    KOG0004 YKR094c

    Well, you get the picture. What I tried to do is to read the text file,
    than look for lines that start with a blank and replace that blank with
    the first word of the previous line, given that this line does in fact
    starts with a word (could also be selected by using KOG[0-9]*). I
    thought of storing the KOG[0-9] in a variable, but overall I cant make
    it work and have no real idea how to solve this. Any help would be
    greatly appreciated. Guess for an experienced user this is a three-liner
    _.

    Cheers,

    Marc

    --
    Posted via http://www.ruby-forum.com/.
    Marc Hoeppner, Jul 9, 2007
    #1
    1. Advertising

  2. 2007/7/9, Marc Hoeppner <>:
    > Hi everyone,
    >
    > I expect this is a rather trivial problem, but I just started using ruby
    > and am a bit stuck right now.
    > Here is what I want to do:
    >
    > I have a text file, that contains information in the following format:
    >
    > KOG0003
    > At2g36170
    > At3g52590
    > CE15495
    > 7295730
    > KOG0004
    > Hs20476120
    > YIL148w
    > YKR094c
    > SPAC11G7.04
    >
    > Now, this has to go into a relational database. But right now this is
    > not really a table. The desired output would look something like this:
    >
    > KOG0003 At2g36170
    > KOG0003 At3g52590
    > KOG0003 CE15495
    > KOG0003 7295730
    > KOG0004 Hs20476120
    > KOG0004 YIL148w
    > KOG0004 YKR094c
    >
    > Well, you get the picture. What I tried to do is to read the text file,
    > than look for lines that start with a blank and replace that blank with
    > the first word of the previous line, given that this line does in fact
    > starts with a word (could also be selected by using KOG[0-9]*). I
    > thought of storing the KOG[0-9] in a variable, but overall I cant make
    > it work and have no real idea how to solve this. Any help would be
    > greatly appreciated. Guess for an experienced user this is a three-liner


    Hm... Maybe something like this:

    key = nil
    ARGF.each do |line|
    line.chomp!
    case line
    when /^(\S+)/
    key = line.strip
    when /^\s+(\S+)/
    print key, " ", $1, "\n" if key
    else
    # ignore
    end
    end

    Kind regards

    robert
    Robert Klemme, Jul 9, 2007
    #2
    1. Advertising

  3. On 9 Jul 2007, at 16:42, Marc Hoeppner wrote:

    > Hi everyone,
    >
    > I expect this is a rather trivial problem, but I just started using
    > ruby
    > and am a bit stuck right now.
    > Here is what I want to do:
    >
    > I have a text file, that contains information in the following format:
    >
    > KOG0003
    > At2g36170
    > At3g52590
    > CE15495
    > 7295730
    > KOG0004
    > Hs20476120
    > YIL148w
    > YKR094c
    > SPAC11G7.04
    >
    > Now, this has to go into a relational database. But right now this is
    > not really a table. The desired output would look something like this:
    >
    > KOG0003 At2g36170
    > KOG0003 At3g52590
    > KOG0003 CE15495
    > KOG0003 7295730
    > KOG0004 Hs20476120
    > KOG0004 YIL148w
    > KOG0004 YKR094c
    >
    > Well, you get the picture. What I tried to do is to read the text
    > file,
    > than look for lines that start with a blank and replace that blank
    > with
    > the first word of the previous line, given that this line does in fact
    > starts with a word (could also be selected by using KOG[0-9]*). I
    > thought of storing the KOG[0-9] in a variable, but overall I cant make
    > it work and have no real idea how to solve this. Any help would be
    > greatly appreciated. Guess for an experienced user this is a three-
    > liner
    > ._.
    >
    > Cheers,
    >
    > Marc
    >
    > --
    > Posted via http://www.ruby-forum.com/.
    >


    Not a very fancy solution, but it seems to work for the data you
    posted. Also uses the pattern you suggested, storing the KOG*
    identifier in a variable (field1):

    [alexg@powerbook]/Users/alexg/Desktop(7): cat test.rb
    field1 = nil
    IO.foreach(ARGV[0]) do |l|
    if l.match(/^(\S+)/)
    field1 = $1
    else
    puts "#{field1} #{l.strip}"
    end
    end
    [alexg@powerbook]/Users/alexg/Desktop(8): cat data.dat
    KOG0003
    At2g36170
    At3g52590
    CE15495
    7295730
    KOG0004
    Hs20476120
    YIL148w
    YKR094c
    SPAC11G7.04
    [alexg@powerbook]/Users/alexg/Desktop(9): ruby test.rb data.dat
    KOG0003 At2g36170
    KOG0003 At3g52590
    KOG0003 CE15495
    KOG0003 7295730
    KOG0004 Hs20476120
    KOG0004 YIL148w
    KOG0004 YKR094c
    KOG0004 SPAC11G7.04

    Alex Gutteridge

    Bioinformatics Center
    Kyoto University
    Alex Gutteridge, Jul 9, 2007
    #3
  4. Marc Hoeppner, Jul 9, 2007
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Curt
    Replies:
    3
    Views:
    1,856
    Sahil Malik
    Jun 18, 2004
  2. Chris Lane
    Replies:
    3
    Views:
    377
    Chris Lane
    Nov 17, 2003
  3. Draz

    XML text reformatting

    Draz, Jul 25, 2005, in forum: XML
    Replies:
    0
    Views:
    374
  4. iwawi

    text file reformatting

    iwawi, Oct 31, 2010, in forum: Python
    Replies:
    8
    Views:
    221
    iwawi
    Nov 3, 2010
  5. Adam Akhtar
    Replies:
    19
    Views:
    335
    Adam Akhtar
    Apr 28, 2009
Loading...

Share This Page