Replace Text at Specific Positions Across Files

Shiny Hydra · Mar 17, 2010

Hello everyone,

I'm new to Ruby and after trying to look through a ton of classes and
methods, I decided it would be best to ask some more seasoned
individuals for help. I'm currently working on a project that
essentially deals with a relational DB in text format. There is a
standard layout throughout a variety of text files and each of them have
corresponding information. For example, if positions 10-17 is populated
with genderM in a file with .aaa extension, male should be written in
positions 30-34 in a file with .bbb extensions. There can be multiple
lines, each relating to a different object.

After looking through the file and directory classes I can't find a
obvious way to code this. How would I write/overwrite a specific
position number in a certain extension based on information from another
file? I know to read the information I just use readlines and store the
position using something similar to textfile1[10,7] and that I can use
file.extname to get the extension, but beyond this I'm stuck. I
apologize for the basic question, but I would greatly appreciate the
help.

Thanks!

Robert Klemme · Mar 18, 2010

2010/3/17 Shiny Hydra said:
I'm new to Ruby and after trying to look through a ton of classes and
methods, I decided it would be best to ask some more seasoned
individuals for help. =A0I'm currently working on a project that
essentially deals with a relational DB in text format. =A0There is a
standard layout throughout a variety of text files and each of them have
corresponding information. =A0For example, if positions 10-17 is populate= d
with genderM in a file with .aaa extension, male should be written in
positions 30-34 in a file with .bbb extensions. =A0There can be multiple
lines, each relating to a different object.

So your file has fixed width records? This is important to know,
otherwise approach 2 from below becomes tricky (you basically need to
read line by line in order to find the proper position whereas you
otherwise can calculate the position via record size).

After looking through the file and directory classes I can't find a
obvious way to code this. =A0How would I write/overwrite a specific
position number in a certain extension based on information from another
file? =A0I know to read the information I just use readlines and store th= e
position using something similar to textfile1[10,7] and that I can use
file.extname to get the extension, but beyond this I'm stuck. =A0I
apologize for the basic question, but I would greatly appreciate the
help.

You have basically two options:

1. do it in memory

2. do it on disk

ad 1: You can read a complete file by using IO#read into a String,
then you manipulate it via String manipulation methods and write it
out. This obviously only works up to a certain file size.

ad 2: Use IO#seek to find the proper write position and use IO#write
to overwrite bytes at this position.

Btw, is there a particular reason why you create what looks like a
relational database based on text files?

Kind regards

robert

--=20
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

Shiny Hydra · Mar 18, 2010

So your file has fixed width records? This is important to know,

otherwise approach 2 from below becomes tricky (you basically need to
read line by line in order to find the proper position whereas you
otherwise can calculate the position via record size).

Yes, the records are all fixed width. The width of the file is based on
the extension type.

Btw, is there a particular reason why you create what looks like a
relational database based on text files?

I'm working off of a standard format that has been used for years
(called Mail.dat). Editing the files has been an extremely time
consuming process, so I'm trying to write an automated script to batch
replace specific parameters. After doing some research, it seemed like
Ruby was a great language to learn for this type of text manipulation
and it turned out to be quite fun to boot.

I'm currently working through the book Beginning Ruby: From Novice to
Professional, but it does not go very in depth on text file manipulation
techniques. I tried looking through the classes and methods online, but
without a strong foundation in the language it's difficult to navigate
that amount of information. If you could provide any additional
information it would be immensely helpful.

Thanks again!

Robert Klemme · Mar 18, 2010

2010/3/18 Shiny Hydra said:
Yes, the records are all fixed width. =A0The width of the file is based o= n
the extension type.

I'm working off of a standard format that has been used for years
(called Mail.dat). =A0Editing the files has been an extremely time
consuming process, so I'm trying to write an automated script to batch
replace specific parameters. =A0After doing some research, it seemed like
Ruby was a great language to learn for this type of text manipulation
and it turned out to be quite fun to boot.

That's good! I hope you continue to enjoy your journey.

I'm currently working through the book Beginning Ruby: From Novice to
Professional, but it does not go very in depth on text file manipulation
techniques. =A0I tried looking through the classes and methods online, bu= t
without a strong foundation in the language it's difficult to navigate
that amount of information. =A0If you could provide any additional
information it would be immensely helpful.

You could start with searching the archives of ruby-talk for "File"
and "seek". That should give you some bits of code which deal with
file IO different from sequentially reading or writing.

Kind regards

robert

--=20
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

Josh Cheek · Mar 18, 2010

[Note: parts of this message were removed to make it a legal post.]

That's good! I hope you continue to enjoy your journey.

You could start with searching the archives of ruby-talk for "File"
and "seek". That should give you some bits of code which deal with
file IO different from sequentially reading or writing.

Kind regards

robert

This got me excited, my file manipulation isn't very good, so thought I'd
give it a try. Here is what I got http://gist.github.com/336838

Robert Klemme · Mar 18, 2010

This got me excited, my file manipulation isn't very good, so thought I'd
give it a try. Here is what I got http://gist.github.com/336838

I don't have the time to go completely through your code. Few remarks
anyway:

- It is not clear to me why you have MailRecord composed of Records.
I'd probably rather have picked another name, e.g. MailFile. Class
comments would also help.

- Your method #validate is invoked on individual fields but you rather
want to check the complete record length.

- You can use Struct to easily define Record in less lines

Record = Struct.new :title , :id , :from , :to ,

ffset

- I would not store the position of the record in the Record instance
because that way you mix business logic state (record contents) and
storage related state. If you have another storage medium your position
in the Record will be superfluous.

- Your MailRecord is a good abstraction of the file storage.

- I would not use puts with your fixed size records. Rather I would use
write and read. Plus, I'd place those methods in MailRecord and not in
Record because they are specific to this particular storage medium.

Kind regards

robert

Shiny Hydra · Mar 19, 2010

This got me excited, my file manipulation isn't very good, so thought

I'd
give it a try. Here is what I got http://gist.github.com/336838

Thank you very much for writing up a sample of how I could begin making
a script to manipulate this data. This provides a fantastic spot to jump
in and begin building this application!

Robert Klemme · Mar 19, 2010

2010/3/19 Shiny Hydra said:
Thank you very much for writing up a sample of how I could begin making
a script to manipulate this data. This provides a fantastic spot to jump
in and begin building this application!

See also http://groups.google.de/group/comp.lang.ruby/msg/c53f394410a6cff0
which did not (yet) make it into the mailing list.

Kind regards

robert

James Edward Gray II · Mar 19, 2010

=20
See also = http://groups.google.de/group/comp.lang.ruby/msg/c53f394410a6cff0
which did not (yet) make it into the mailing list.

Yeah, the Gateway is having trouble talking to our Usenet host. I've =
emailed him about it.

James Edward Gray II=

Josh Cheek · Mar 19, 2010

[Note: parts of this message were removed to make it a legal post.]

See also http://groups.google.de/group/comp.lang.ruby/msg/c53f394410a6cff0
which did not (yet) make it into the mailing list.

Kind regards

robert

Hi, Robert

I don't have the time to go completely through your code. Few remarks
anyway:

That is fine, I appreciate what you did take the time to go through

- It is not clear to me why you have MailRecord composed of Records.
I'd probably rather have picked another name, e.g. MailFile. Class
comments would also help.

That name would probably make sense. I'm not sure what you mean by Class
comments, though.

- Your method #validate is invoked on individual fields but you rather
want to check the complete record length.

Well, each line must match the line length, or when we try to pull specific
attributes from that record they will not be correct. If each line is the
correct length, then the record is the correct length.

- You can use Struct to easily define Record in less lines

Record = Struct.new :title , :id , :from , :to , ffset

That makes a lot of sense, thanks for the tip, I default to OO first,
because that is what I am most familiar with, so structs don't come quickly
to mind. Though I have read your article probably three times
http://blog.rubybestpractices.com/posts/rklemme/017-Struct.html it usually
only comes to mind it when my structure is very dynamic, ie OpenStruct

- Your MailRecord is a good abstraction of the file storage.

Thank you

because that way you mix business logic state (record contents) and
storage related state. If you have another storage medium your position
in the Record will be superfluous.

- I would not use puts with your fixed size records. Rather I would use
write and read. Plus, I'd place those methods in MailRecord and not in
Record because they are specific to this particular storage medium.

I understand your points here, but I am having difficulty thinking of a way
to implement this. Perhaps the record should have a reference to the
MailRecord (or MailFile, as you suggest), and then tell the file to write
itself? But the position thing still seems to be an issue.

Maybe the problem is that I am considering it to be almost an array of files
based on position in the file, but I should remove index/offset from
consideration and instead consider it as a set, where ordering is arbitrary
and can be altered as necessary to accommodate encapsulated logic and data
integrity.

I'm not really sure what a proper approach would look like here.

Anyway, thanks for taking the time to look and comment

Josh Cheek · Mar 19, 2010

[Note: parts of this message were removed to make it a legal post.]

Well, each line must match the line length, or when we try to pull specific
attributes from that record they will not be correct. If each line is the
correct length, then the record is the correct length.Anyway, thanks for
taking the time to look and comment

I just realized this is false, because I was using readline. Initially I was
calculating the offset of each attribute, but then I removed that. So you
are correct, I could just validate the total length. It would be easier and
cleaner.

Robert Klemme · Mar 19, 2010

[Note: parts of this message were removed to make it a legal post.]

See also http://groups.google.de/group/comp.lang.ruby/msg/c53f394410a6cff0
which did not (yet) make it into the mailing list.

Kind regards

robert

Click to expand...

Hi, Robert

I don't have the time to go completely through your code. Few remarks
anyway:

Click to expand...

That is fine, I appreciate what you did take the time to go through

- It is not clear to me why you have MailRecord composed of Records.
I'd probably rather have picked another name, e.g. MailFile. Class
comments would also help.

Click to expand...

That name would probably make sense. I'm not sure what you mean by Class
comments, though.

A comment that describes what the class is about.

Well, each line must match the line length, or when we try to pull specific
attributes from that record they will not be correct. If each line is the
correct length, then the record is the correct length.

So you are saying that all fields in the record have the same length? I
thought LINE_WIDTH would refer to the record's length.

That makes a lot of sense, thanks for the tip, I default to OO first,
because that is what I am most familiar with, so structs don't come quickly
to mind. Though I have read your article probably three times
http://blog.rubybestpractices.com/posts/rklemme/017-Struct.html it usually
only comes to mind it when my structure is very dynamic, ie OpenStruct

Thank you

I understand your points here, but I am having difficulty thinking of a way
to implement this. Perhaps the record should have a reference to the
MailRecord (or MailFile, as you suggest), and then tell the file to write
itself? But the position thing still seems to be an issue.

Maybe the problem is that I am considering it to be almost an array of files
based on position in the file, but I should remove index/offset from
consideration and instead consider it as a set, where ordering is arbitrary
and can be altered as necessary to accommodate encapsulated logic and data
integrity.

I'm not really sure what a proper approach would look like here.

Maybe you could have a method in MailFile that writes record n.
Internally it would seek to n * record length and then write the record
passed. Your caching functionality (i.e. keeping read records) could
probably better go into another class - which potentially wraps MailFile.

Kind regards

robert

I need help in understanding these files on my phone, Could someone help me understand these files? Urgent help needed. Please help.	1	Jun 4, 2023
Select files based on text list of filenames(part of the name:date) with condition	0	May 4, 2022
Select Eof extension files based on text list of filenames with if condition	0	May 4, 2022
"input-group-text" help	7	Aug 10, 2023
Find and count strings of text from multiple files	17	Dec 16, 2021
Search nested folders with specific names in python	0	Sep 23, 2022
Hi, I am a webflow user. I am looking for CSS code that can KEEP ALL ELEMENTS POSITIONED in the SAME spot across all resolutions	0	Oct 27, 2023
RegExp - Match specific words, but not if they're inside parenthesis (with or without other words within)	6	Jan 29, 2023

Replace Text at Specific Positions Across Files

Shiny Hydra

Robert Klemme

Shiny Hydra

Robert Klemme

Josh Cheek

Robert Klemme

Shiny Hydra

Robert Klemme

James Edward Gray II

Josh Cheek

Josh Cheek

Robert Klemme

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads