Replace Text at Specific Positions Across Files

S

Shiny Hydra

Hello everyone,

I'm new to Ruby and after trying to look through a ton of classes and
methods, I decided it would be best to ask some more seasoned
individuals for help. I'm currently working on a project that
essentially deals with a relational DB in text format. There is a
standard layout throughout a variety of text files and each of them have
corresponding information. For example, if positions 10-17 is populated
with genderM in a file with .aaa extension, male should be written in
positions 30-34 in a file with .bbb extensions. There can be multiple
lines, each relating to a different object.

After looking through the file and directory classes I can't find a
obvious way to code this. How would I write/overwrite a specific
position number in a certain extension based on information from another
file? I know to read the information I just use readlines and store the
position using something similar to textfile1[10,7] and that I can use
file.extname to get the extension, but beyond this I'm stuck. I
apologize for the basic question, but I would greatly appreciate the
help.

Thanks!
 
R

Robert Klemme

2010/3/17 Shiny Hydra said:
I'm new to Ruby and after trying to look through a ton of classes and
methods, I decided it would be best to ask some more seasoned
individuals for help. =A0I'm currently working on a project that
essentially deals with a relational DB in text format. =A0There is a
standard layout throughout a variety of text files and each of them have
corresponding information. =A0For example, if positions 10-17 is populate= d
with genderM in a file with .aaa extension, male should be written in
positions 30-34 in a file with .bbb extensions. =A0There can be multiple
lines, each relating to a different object.

So your file has fixed width records? This is important to know,
otherwise approach 2 from below becomes tricky (you basically need to
read line by line in order to find the proper position whereas you
otherwise can calculate the position via record size).
After looking through the file and directory classes I can't find a
obvious way to code this. =A0How would I write/overwrite a specific
position number in a certain extension based on information from another
file? =A0I know to read the information I just use readlines and store th= e
position using something similar to textfile1[10,7] and that I can use
file.extname to get the extension, but beyond this I'm stuck. =A0I
apologize for the basic question, but I would greatly appreciate the
help.

You have basically two options:

1. do it in memory

2. do it on disk

ad 1: You can read a complete file by using IO#read into a String,
then you manipulate it via String manipulation methods and write it
out. This obviously only works up to a certain file size.

ad 2: Use IO#seek to find the proper write position and use IO#write
to overwrite bytes at this position.

Btw, is there a particular reason why you create what looks like a
relational database based on text files?

Kind regards

robert

--=20
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/
 
S

Shiny Hydra

So your file has fixed width records? This is important to know,
otherwise approach 2 from below becomes tricky (you basically need to
read line by line in order to find the proper position whereas you
otherwise can calculate the position via record size).

Yes, the records are all fixed width. The width of the file is based on
the extension type.
Btw, is there a particular reason why you create what looks like a
relational database based on text files?

I'm working off of a standard format that has been used for years
(called Mail.dat). Editing the files has been an extremely time
consuming process, so I'm trying to write an automated script to batch
replace specific parameters. After doing some research, it seemed like
Ruby was a great language to learn for this type of text manipulation
and it turned out to be quite fun to boot.

I'm currently working through the book Beginning Ruby: From Novice to
Professional, but it does not go very in depth on text file manipulation
techniques. I tried looking through the classes and methods online, but
without a strong foundation in the language it's difficult to navigate
that amount of information. If you could provide any additional
information it would be immensely helpful.

Thanks again!
 
R

Robert Klemme

2010/3/18 Shiny Hydra said:
Yes, the records are all fixed width. =A0The width of the file is based o= n
the extension type.


I'm working off of a standard format that has been used for years
(called Mail.dat). =A0Editing the files has been an extremely time
consuming process, so I'm trying to write an automated script to batch
replace specific parameters. =A0After doing some research, it seemed like
Ruby was a great language to learn for this type of text manipulation
and it turned out to be quite fun to boot.

That's good! I hope you continue to enjoy your journey.
I'm currently working through the book Beginning Ruby: From Novice to
Professional, but it does not go very in depth on text file manipulation
techniques. =A0I tried looking through the classes and methods online, bu= t
without a strong foundation in the language it's difficult to navigate
that amount of information. =A0If you could provide any additional
information it would be immensely helpful.

You could start with searching the archives of ruby-talk for "File"
and "seek". That should give you some bits of code which deal with
file IO different from sequentially reading or writing.

Kind regards

robert

--=20
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/
 
J

Josh Cheek

[Note: parts of this message were removed to make it a legal post.]

That's good! I hope you continue to enjoy your journey.


You could start with searching the archives of ruby-talk for "File"
and "seek". That should give you some bits of code which deal with
file IO different from sequentially reading or writing.

Kind regards

robert
This got me excited, my file manipulation isn't very good, so thought I'd
give it a try. Here is what I got http://gist.github.com/336838
 
R

Robert Klemme

This got me excited, my file manipulation isn't very good, so thought I'd
give it a try. Here is what I got http://gist.github.com/336838

I don't have the time to go completely through your code. Few remarks
anyway:

- It is not clear to me why you have MailRecord composed of Records.
I'd probably rather have picked another name, e.g. MailFile. Class
comments would also help.

- Your method #validate is invoked on individual fields but you rather
want to check the complete record length.

- You can use Struct to easily define Record in less lines

Record = Struct.new :title , :id , :from , :to , :eek:ffset

- I would not store the position of the record in the Record instance
because that way you mix business logic state (record contents) and
storage related state. If you have another storage medium your position
in the Record will be superfluous.

- Your MailRecord is a good abstraction of the file storage.

- I would not use puts with your fixed size records. Rather I would use
write and read. Plus, I'd place those methods in MailRecord and not in
Record because they are specific to this particular storage medium.

Kind regards

robert
 
S

Shiny Hydra

This got me excited, my file manipulation isn't very good, so thought
I'd
give it a try. Here is what I got http://gist.github.com/336838

Thank you very much for writing up a sample of how I could begin making
a script to manipulate this data. This provides a fantastic spot to jump
in and begin building this application!
 
J

Josh Cheek

[Note: parts of this message were removed to make it a legal post.]

See also http://groups.google.de/group/comp.lang.ruby/msg/c53f394410a6cff0
which did not (yet) make it into the mailing list.

Kind regards

robert
Hi, Robert
I don't have the time to go completely through your code. Few remarks
anyway:

That is fine, I appreciate what you did take the time to go through :)
- It is not clear to me why you have MailRecord composed of Records.
I'd probably rather have picked another name, e.g. MailFile. Class
comments would also help.

That name would probably make sense. I'm not sure what you mean by Class
comments, though.

- Your method #validate is invoked on individual fields but you rather
want to check the complete record length.

Well, each line must match the line length, or when we try to pull specific
attributes from that record they will not be correct. If each line is the
correct length, then the record is the correct length.

- You can use Struct to easily define Record in less lines
Record = Struct.new :title , :id , :from , :to , :eek:ffset

That makes a lot of sense, thanks for the tip, I default to OO first,
because that is what I am most familiar with, so structs don't come quickly
to mind. Though I have read your article probably three times
http://blog.rubybestpractices.com/posts/rklemme/017-Struct.html it usually
only comes to mind it when my structure is very dynamic, ie OpenStruct
- Your MailRecord is a good abstraction of the file storage.

Thank you :)
because that way you mix business logic state (record contents) and
storage related state. If you have another storage medium your position
in the Record will be superfluous.
- I would not use puts with your fixed size records. Rather I would use
write and read. Plus, I'd place those methods in MailRecord and not in
Record because they are specific to this particular storage medium.

I understand your points here, but I am having difficulty thinking of a way
to implement this. Perhaps the record should have a reference to the
MailRecord (or MailFile, as you suggest), and then tell the file to write
itself? But the position thing still seems to be an issue.

Maybe the problem is that I am considering it to be almost an array of files
based on position in the file, but I should remove index/offset from
consideration and instead consider it as a set, where ordering is arbitrary
and can be altered as necessary to accommodate encapsulated logic and data
integrity.


I'm not really sure what a proper approach would look like here.

Anyway, thanks for taking the time to look and comment :)
 
J

Josh Cheek

[Note: parts of this message were removed to make it a legal post.]

Well, each line must match the line length, or when we try to pull specific
attributes from that record they will not be correct. If each line is the
correct length, then the record is the correct length.Anyway, thanks for
taking the time to look and comment :)

I just realized this is false, because I was using readline. Initially I was
calculating the offset of each attribute, but then I removed that. So you
are correct, I could just validate the total length. It would be easier and
cleaner.
 
R

Robert Klemme

[Note: parts of this message were removed to make it a legal post.]

See also http://groups.google.de/group/comp.lang.ruby/msg/c53f394410a6cff0
which did not (yet) make it into the mailing list.

Kind regards

robert
Hi, Robert
I don't have the time to go completely through your code. Few remarks
anyway:

That is fine, I appreciate what you did take the time to go through :)
- It is not clear to me why you have MailRecord composed of Records.
I'd probably rather have picked another name, e.g. MailFile. Class
comments would also help.

That name would probably make sense. I'm not sure what you mean by Class
comments, though.

A comment that describes what the class is about.
Well, each line must match the line length, or when we try to pull specific
attributes from that record they will not be correct. If each line is the
correct length, then the record is the correct length.

So you are saying that all fields in the record have the same length? I
thought LINE_WIDTH would refer to the record's length.
That makes a lot of sense, thanks for the tip, I default to OO first,
because that is what I am most familiar with, so structs don't come quickly
to mind. Though I have read your article probably three times
http://blog.rubybestpractices.com/posts/rklemme/017-Struct.html it usually
only comes to mind it when my structure is very dynamic, ie OpenStruct

Thank you :)



I understand your points here, but I am having difficulty thinking of a way
to implement this. Perhaps the record should have a reference to the
MailRecord (or MailFile, as you suggest), and then tell the file to write
itself? But the position thing still seems to be an issue.

Maybe the problem is that I am considering it to be almost an array of files
based on position in the file, but I should remove index/offset from
consideration and instead consider it as a set, where ordering is arbitrary
and can be altered as necessary to accommodate encapsulated logic and data
integrity.


I'm not really sure what a proper approach would look like here.

Maybe you could have a method in MailFile that writes record n.
Internally it would seek to n * record length and then write the record
passed. Your caching functionality (i.e. keeping read records) could
probably better go into another class - which potentially wraps MailFile.

Kind regards

robert
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,582
Members
45,059
Latest member
cryptoseoagencies

Latest Threads

Top