Splitting a multirecord per file format to a single record per file format: Right approach?

Discussion in 'Ruby' started by Randy Kramer, Jan 12, 2007.

  1. Randy Kramer

    Randy Kramer Guest

    I'm trying to write essentially what I guess you'd call a filter (or maybe not
    quite exactly). It needs to:

    * read multi-line records from a file (one record at a time)
    * then, with that one record:
    * prepend some additional lines
    * make substitutions for some of the lines already in the record
    * grab some other portions of the record (less than a line, but usually
    multiple words), find the "non-null" pieces, and incorporate those in
    another header line
    * create a unique filename
    * write that (single) record to that file

    I got started (maybe) by finding a likely looking piece of code in the Ruby
    Cookbook, and tried to modify it to fit my situation:

    open('/rhk/work/ask_notes/politics.twk') { |f| f.each('\x80\x81\x82\x83') { |
    record| p record } }

    At this point, I'm stuck, and need some clues to move forward. (In addition,
    I have a few not completely essential to understand questions, below.)

    I think the next step is, within the code block / continuation (is that (or
    one of those) the right name?), to slurp the entire record into a string,
    prepend the additional lines, do the substitutions, ..., and finally write a
    single record to the new filename.

    Main Question:

    Am I on the right track, or must I take some different approach to be able to
    process the content of a single record at a time? (I mean, I did a little
    experiment (possibly a bad experiment ;-) like this:

    rec_num = 0

    open('/rhk/work/ask_notes/politics.twk') { |f| f.each('\x80\x81\x82\x83') { |
    record| rec_num = rec_num + 1 } }

    p rec_num

    It only counts to one--instead of 70 to reflect the 70 records I know are in
    that particular file (and which are all printed out with the earlier version
    which has the line "{ |record| p record }").

    Other questions: (I could start a thread for each, but I'll start this way and
    split them up if I either get too much or not enough response ;-)

    1. What is the right name for that construction: is that a continuation, a
    (code?) block, or something else. (Is it possibly that Ruby calls this a
    code block and some other languages call it a continuation, or it is an
    example of one kind of continuation available in Ruby?)

    2. What's the story on white space in that kind of structure. I experimented
    with trying to format it to make it (possibly) easier to read, something like
    this:

    open('/rhk/work/ask_notes/politics.twk') {
    |f| f.each('\x80\x81\x82\x83') {
    |record| p record

    <anticipated location of code to process a single record>

    }
    }

    But any whitespace (i.e., newlines) that I added just caused syntax errors.
    Is there a way to "prettyformat" that structure?

    3. The content of the files I have to convert is actually more like this:

    <bof>
    Record header ('\x80\x81\x82\x83')

    Record (with blank lines)
    (trailing blank line)
    Record header ('\x80\x81\x82\x83')

    Record (with blank lines)
    (trailing blank line)
    Record header ('\x80\x81\x82\x83')

    Record (with blank lines)
    <eof>

    The Ruby code that I copied from the Ruby Cookbook is more aimed at separating
    records that end with a record separator (instead of starting with a record
    header). I can work this way--I mean, worst case I modify every input file
    to do something like remove the first record header from the file and add a
    record header at the end of the file, but that's probably not really
    necessary.

    But, it seems like I'm using not quite the right tool. Is there a better
    approach that more exactly fits the format of my files?

    Thanks!
    Randy Kramer
     
    Randy Kramer, Jan 12, 2007
    #1
    1. Advertising

  2. Re: Splitting a multirecord per file format to a single record per

    On 12.01.2007 15:02, Randy Kramer wrote:
    > I'm trying to write essentially what I guess you'd call a filter (or maybe not
    > quite exactly). It needs to:
    >
    > * read multi-line records from a file (one record at a time)
    > * then, with that one record:
    > * prepend some additional lines
    > * make substitutions for some of the lines already in the record
    > * grab some other portions of the record (less than a line, but usually
    > multiple words), find the "non-null" pieces, and incorporate those in
    > another header line
    > * create a unique filename
    > * write that (single) record to that file

    [...]
    > 3. The content of the files I have to convert is actually more like this:
    >
    > <bof>
    > Record header ('\x80\x81\x82\x83')
    >
    > Record (with blank lines)
    > (trailing blank line)
    > Record header ('\x80\x81\x82\x83')
    >
    > Record (with blank lines)
    > (trailing blank line)
    > Record header ('\x80\x81\x82\x83')
    >
    > Record (with blank lines)
    > <eof>


    You could do:

    # create a class for your records or use OpenStruct
    YourRecord = Struct.new :name, :length, :foo, :bar
    def dump()
    File.open(file_name, "w") do |io|
    # whatever
    end
    end
    end


    current = nil

    File.foreach('your file') do |line|
    line.chomp!

    case line
    when /^<bof>$/
    current = YourRecord.new
    when /^<eof>$/
    current.dump
    current = nil
    when /Record header/
    ...
    else
    # ignore or whatever
    end
    end

    Kind regards

    robert
     
    Robert Klemme, Jan 12, 2007
    #2
    1. Advertising

  3. Randy Kramer

    Randy Kramer Guest

    On Friday 12 January 2007 09:15 am, Robert Klemme wrote:
    > On 12.01.2007 15:02, Randy Kramer wrote:
    > > I'm trying to write essentially what I guess you'd call a filter (or
    > > maybe not quite exactly). It needs to:


    > You could do:


    Thanks--that will get me started!

    Randy Kramer
     
    Randy Kramer, Jan 12, 2007
    #3
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.

Share This Page