several replacing problems

A

Andrei Caragea

Hello everyone,

Take 1 xls file witch contains 2 columns. column A contains several
"keywords" lets say: today, today+1, today+2 etc. and column B witch
contains dates in yyyymmdd format 20090706, 20090707, 20090708 etc.

i have another file (txt) witch contains a bunch of text.. including our
"keywords"

What i need to do is.. search in the txt file for the "keywords" and
replace them with the dates and write the new content to a different
file (same name and extension) in a different folder.

The code is as follows:

require 'rubygems'
require 'roo'

oo=Excel.new("data.xls")
1.upto(56000) do |line|
template = oo.cell(line,'A')
date = oo.cell(line,'B')
date = date.to_s.gsub!('-','')

date_template = { /\b#{template}\b(?=\s|$)/ => "#{date}"}

Dir['*.txt'].each do |file_name|
replace = File.read( file_name )
if template != nil
date_template.each do |template, date|
replace.gsub!( template, date )
end
puts replace
end

File.makedirs("results")
File::eek:pen("results/#{file_name}","w") do |file_name|
file_name<<replace
end
end
end

Problems I have:

#1 The script enters a loop, freezes after a number of loops, then stops
with exit code: 0
#2 'puts replace' shows me that the first loop replaces the first
"keyword" it finds, the next ones (10-20) don't replace anything then it
stars replacing again
#3 the file created is the same as the original (no replace was made)
#4 i don't really know if the regex above is correct. I used it to do a
"whole word matching" and it seems to be ok for now.
#5 problems I have yet do discover

I know the code is a mess but bare with me, I've been studying ruby only
for a month or so.

Any tips & tricks will be greatly appreciated

I have attached the txt file witch needs to be read and replaced.

Thank you.

AC

Attachments:
http://www.ruby-forum.com/attachment/3869/file.txt
 
B

Brian Candler

Are your loops nested right? You seem to have an outer loop which reads
one line of the Excel file, and then after doing that you read through
all the files in the directory. This means that you will be reading and
writing each of your files 56,000 times.

Better to read the Excel file once into memory (e.g. a Hash structure),
and then process each file once. For each line, if the key is found in
the Hash, replace it with the value in the Hash.

Also, in ruby 1.8.x beware of code like this:

File::eek:pen("results/#{file_name}","w") do |file_name|
file_name<<replace
end

The inner 'file_name' block variable aliases to the outer 'file_name'.
That is, if file_name contained a String before executing this code, it
will contain a File afterwards. Better:

File::eek:pen("results/#{file_name}","w") do |f|
f<<replace
end
 
A

Andrei Caragea

Brian said:
Better to read the Excel file once into memory (e.g. a Hash structure),
and then process each file once. For each line, if the key is found in
the Hash, replace it with the value in the Hash.

this is exactly what i'm trying to do but can't seem to get it right.
can you give me an example?

Also, in ruby 1.8.x beware of code like this:

File::eek:pen("results/#{file_name}","w") do |file_name|
file_name<<replace
end

The inner 'file_name' block variable aliases to the outer 'file_name'.
That is, if file_name contained a String before executing this code, it
will contain a File afterwards. Better:

File::eek:pen("results/#{file_name}","w") do |f|
f<<replace
end

thank you, i changed this
 
A

Andrei Caragea

I managed to figure out how to do the hash and get rid of those annoying
loops. Everything works fine except one thing:

the XLS contains the following:

A B
TODAY 20090708
TODAY+1 20090709
TODAY+10 20090718
TODAY+19 20090727
TODAY+2 20090710
TODAY+20 20090728
TODAY+24 20090801
TODAY+25 20090802
TODAY+4 20090712
TODAY+5 20090713
TODAY+6 20090714
TODAY+7 20090715
TODAY+8 20090716
TODAY+9 20090717
TODAY-1 20090707
TODAY-10 20090628
TODAY-14 20090624
TODAY-15 20090623
TODAY-19 20090619
TODAY-2 20090706
TODAY-20 20090618
TODAY-28 20090610
TODAY-29 20090609
TODAY-30 20090608
TODAY-9 20090629
TODAY-5 20090703
TODAY-25 20090613
TODAY+11 20090719
TODAY+1Y 20100708

after replacing is done i get the following result:
ruby read_xl.rb
20090708 => +t_todaya+
20090708 =>
20090708+1 => tp1a,
20090718 => tp10a,
20090727 => tp19a,
20090708+2 => tp2a,
20090728 => tp20a,
20090801 => tp24a,
20090708+25 => tp25a,
20090708+4 => tp4a,
20090713 => tp5a,
20090714 => tp6a,
20090715 => tp7a,
20090716 => tp8a,
20090717 => tp9a,
20090719 => tp11a,
20100708 => tp365a,
20090628 => tm10a,
200907074 => tm14a,
200907075 => tm15a,
20090619 => tm19a,
20090708-2 => tm2a,
20090618 => tm20a,
20090610 => tm28a,
20090609 => tm29a,
20090608 => tm30a,
20090629 => tm9a,
20090707 => tm1a,
20090708-5 => tm5a,
20090708-25 => tm25a
Exit code: 0

Can anyone help me with a regex to fix the problem?
If there is any other way to fix it i would apreciate the feedback

thanks.
 
A

Andrei Caragea

forgot the new code :D

require 'rubygems'
require 'roo'



h = {}
oo=Excel.new("data.xls")

1.upto(29) do |line|
a = oo.cell(line,'A')
date = oo.cell(line,'B')
b = date.to_s.gsub!('-','')
h[a] = b
end

Dir['*.txt'].each do |file_name|
replace = File.read(file_name)
h.each do |template, date|
replace.gsub!(template, date)
end
puts replace
File.makedirs("results")
File::eek:pen("results/#{file_name}","w") do |f|
f<<replace
end
end
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,756
Messages
2,569,533
Members
45,007
Latest member
OrderFitnessKetoCapsules

Latest Threads

Top