Splitting a multirecord per file format to a single record per file format: Right approach?

R

Randy Kramer

I'm trying to write essentially what I guess you'd call a filter (or maybe not
quite exactly). It needs to:

* read multi-line records from a file (one record at a time)
* then, with that one record:
* prepend some additional lines
* make substitutions for some of the lines already in the record
* grab some other portions of the record (less than a line, but usually
multiple words), find the "non-null" pieces, and incorporate those in
another header line
* create a unique filename
* write that (single) record to that file

I got started (maybe) by finding a likely looking piece of code in the Ruby
Cookbook, and tried to modify it to fit my situation:

open('/rhk/work/ask_notes/politics.twk') { |f| f.each('\x80\x81\x82\x83') { |
record| p record } }

At this point, I'm stuck, and need some clues to move forward. (In addition,
I have a few not completely essential to understand questions, below.)

I think the next step is, within the code block / continuation (is that (or
one of those) the right name?), to slurp the entire record into a string,
prepend the additional lines, do the substitutions, ..., and finally write a
single record to the new filename.

Main Question:

Am I on the right track, or must I take some different approach to be able to
process the content of a single record at a time? (I mean, I did a little
experiment (possibly a bad experiment ;-) like this:

rec_num = 0

open('/rhk/work/ask_notes/politics.twk') { |f| f.each('\x80\x81\x82\x83') { |
record| rec_num = rec_num + 1 } }

p rec_num

It only counts to one--instead of 70 to reflect the 70 records I know are in
that particular file (and which are all printed out with the earlier version
which has the line "{ |record| p record }").

Other questions: (I could start a thread for each, but I'll start this way and
split them up if I either get too much or not enough response ;-)

1. What is the right name for that construction: is that a continuation, a
(code?) block, or something else. (Is it possibly that Ruby calls this a
code block and some other languages call it a continuation, or it is an
example of one kind of continuation available in Ruby?)

2. What's the story on white space in that kind of structure. I experimented
with trying to format it to make it (possibly) easier to read, something like
this:

open('/rhk/work/ask_notes/politics.twk') {
|f| f.each('\x80\x81\x82\x83') {
|record| p record

<anticipated location of code to process a single record>

}
}

But any whitespace (i.e., newlines) that I added just caused syntax errors.
Is there a way to "prettyformat" that structure?

3. The content of the files I have to convert is actually more like this:

<bof>
Record header ('\x80\x81\x82\x83')

Record (with blank lines)
(trailing blank line)
Record header ('\x80\x81\x82\x83')

Record (with blank lines)
(trailing blank line)
Record header ('\x80\x81\x82\x83')

Record (with blank lines)
<eof>

The Ruby code that I copied from the Ruby Cookbook is more aimed at separating
records that end with a record separator (instead of starting with a record
header). I can work this way--I mean, worst case I modify every input file
to do something like remove the first record header from the file and add a
record header at the end of the file, but that's probably not really
necessary.

But, it seems like I'm using not quite the right tool. Is there a better
approach that more exactly fits the format of my files?

Thanks!
Randy Kramer
 
R

Robert Klemme

I'm trying to write essentially what I guess you'd call a filter (or maybe not
quite exactly). It needs to:

* read multi-line records from a file (one record at a time)
* then, with that one record:
* prepend some additional lines
* make substitutions for some of the lines already in the record
* grab some other portions of the record (less than a line, but usually
multiple words), find the "non-null" pieces, and incorporate those in
another header line
* create a unique filename
* write that (single) record to that file [...]
3. The content of the files I have to convert is actually more like this:

<bof>
Record header ('\x80\x81\x82\x83')

Record (with blank lines)
(trailing blank line)
Record header ('\x80\x81\x82\x83')

Record (with blank lines)
(trailing blank line)
Record header ('\x80\x81\x82\x83')

Record (with blank lines)
<eof>

You could do:

# create a class for your records or use OpenStruct
YourRecord = Struct.new :name, :length, :foo, :bar
def dump()
File.open(file_name, "w") do |io|
# whatever
end
end
end


current = nil

File.foreach('your file') do |line|
line.chomp!

case line
when /^<bof>$/
current = YourRecord.new
when /^<eof>$/
current.dump
current = nil
when /Record header/
...
else
# ignore or whatever
end
end

Kind regards

robert
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,731
Messages
2,569,432
Members
44,835
Latest member
KetoRushACVBuy

Latest Threads

Top