problem with ".scan"

P

Peter Bailey

RUBY's complaining about the following 3 lines of code. I've got it in a
new program, but, I copied it directly from an older, working program.
Can someone help me understand what's the problem with the "scan" line,
or, apparently, the "each" line?

Thanks,
Peter


10 Dir.glob("*.ps").each do |psfile|
11 file_contents = File.read(psfile)
12 file_contents.scan(/\%\%Pages: (\d{1,5})[ ]+\n/) do

Error message:

E:/PageCounts/test1.rb:12:in `scan': string modified (RuntimeError)
from E:/PageCounts/test1.rb:12
from E:/PageCounts/test1.rb:10:in `each'
from E:/PageCounts/test1.rb:10
 
A

ara.t.howard

RUBY's complaining about the following 3 lines of code. I've got it in a
new program, but, I copied it directly from an older, working program.
Can someone help me understand what's the problem with the "scan" line,
or, apparently, the "each" line?

Thanks,
Peter


10 Dir.glob("*.ps").each do |psfile|
11 file_contents = File.read(psfile)
12 file_contents.scan(/\%\%Pages: (\d{1,5})[ ]+\n/) do

the modification is probably here. can't you show us everything up through
the matching end?

-a
 
P

Peter Bailey

unknown said:
11 file_contents = File.read(psfile)
12 file_contents.scan(/\%\%Pages: (\d{1,5})[ ]+\n/) do

the modification is probably here. can't you show us everything up
through
the matching end?

-a

Sorry. It's a bit much. That's why I was holding back. Here's the whole
script.

require 'kirbybase'
Dir.chdir("E:/pagecounts")
#First, create the database table.
db = KirbyBase.new
# If table exists, delete it.
db.drop_table:)pageinfo) if db.table_exists?:)pageinfo)
pageinfo_tbl = db.create_table:)pageinfo,
:filename, {:DataType=>:String,
:Index=>1},
:lconstant, :String,
:compcode, :String,
:primecode, :Integer,
:costcenter, :String,
:acctgroup, :Integer,
:blank, :String,
:description, :String,
:pagecount, :Float,
:sjccode, :String,
:fullname, {:DataType=>:String,
:Index=>2}
)
# Import the csv file.
pageinfo_tbl.import_csv('McArdle_indexes.csv')

=begin
Parse each postscript print file in the polled directory. Create
variables for:
the number of pages in each file; the number of blank pages in each
file; and,
what exact pages are blank.
=end
Dir.glob("*.ps").each do |psfile|
file_contents = File.read(psfile)
file_contents.scan(/\%\%Pages: (\d{1,5})[ ]+\n/) do
totalpages = $1
if (totalpages.to_i % 2) !=0 then
newtotalpages = totalpages.to_i + 1
file_contents << "\%\%Blank page for Asura.\n\%\%Page:
#{newtotalpages.to_i}\nshowpage\n"
File.open(psfile, "w") { |f| f.print file_contents }
FileUtils.touch(psfile)
end

=begin
Find blank pages in the postscript file. Look for the regular expression
that
sees a page callout followed by postscript data that does not include
data in parentheses. Any type on a postscript page is enclosed in
parentheses,
so, that's why this is a legitimate search. Blank pages have no
parenthesized
data.
=end
blanks = []
file_contents.scan(/\%\%Page: [()0-9{1,5}]
([0-9]{1,5})\n[^\(.*\)]\%\%Page/)
do |match|
blanks.push($1)
end
file_contents.scan(/\%\%Blank page for Asura.\n/) do |match|
blanks.push(totalpages.to_i + 1)
end

=begin
Open a "pageinfo" file. Put page information about the file into it.
Notice that the variable for the total number of pages differs depending
on whether a "newtotalpages" variable exists. And, that variable only
exists if the original page count was odd and a blank had to be added.
=end
filename = File.basename("#{psfile}", '.ps')
pageinfofile = File.basename("#{psfile}", '.ps') + ".pageinfo"
File.open("E:/pagecounts/#{pageinfofile}", "a") do |fileinfo|
if newtotalpages then
fileinfo << #{filename}\n << "Total number of pages in this PDF:
#{newtotalpages}\n" <<
"Number of blank pages in this PDF: #{blanks.size}\n" <<
"Specific pages that are blank in this PDF: " <<
"#{blanks.join(', ')}\n"
else
fileinfo << #{filename}\n <<
"Total number of pages in this PDF: #{totalpages}\n" <<
"Number of blank pages in this PDF: #{blanks.size}\n" <<
"Specific pages that are blank in this PDF: " <<
"#{blanks.join(', ')}\n"
end
end
end
end

=begin
Back to the database table. . . .
Query against the table and match the filename in the directory with
whichever entry
in the "filename" column of the table matches. Then, if there's a match,
populate
the "pagecount" field in that row of the table with the variable for the
page count, as
found above. That variable name is "newtotalpages."
=end

Dir.glob("*.ps").each do |dirfile|
result = pageinfo_tbl.select:)filename) { |r| dirfile =~
Regexp.new(r.filename) }
pageinfo_tbl.update { |r| r.name ==
{filename}.set:)pagecount=>#{newtotalpages}) } unless result.nil?
end
 
A

ara.t.howard

unknown said:
11 file_contents = File.read(psfile)
12 file_contents.scan(/\%\%Pages: (\d{1,5})[ ]+\n/) do

the modification is probably here. can't you show us everything up
through
the matching end?

-a

Sorry. It's a bit much. That's why I was holding back. Here's the whole
script.

Dir.glob("*.ps").each do |psfile|
file_contents = File.read(psfile)
file_contents.scan(/\%\%Pages: (\d{1,5})[ ]+\n/) do
totalpages = $1
if (totalpages.to_i % 2) !=0 then
newtotalpages = totalpages.to_i + 1
file_contents << "\%\%Blank page for Asura.\n\%\%Page:
^^
^^
^^
^^
the modification is question
#{newtotalpages.to_i}\nshowpage\n"
File.open(psfile, "w") { |f| f.print file_contents }
FileUtils.touch(psfile)
end

so, ruby is correct, you are modifying a string while in an in-progress scan
block. easy-cheasy.

kind regards.

-a
 
P

Peter Bailey

unknown said:
Sorry. It's a bit much. That's why I was holding back. Here's the whole
script.

Dir.glob("*.ps").each do |psfile|
file_contents = File.read(psfile)
file_contents.scan(/\%\%Pages: (\d{1,5})[ ]+\n/) do
totalpages = $1
if (totalpages.to_i % 2) !=0 then
newtotalpages = totalpages.to_i + 1
file_contents << "\%\%Blank page for Asura.\n\%\%Page:
^^
^^
^^
^^
the modification is question
#{newtotalpages.to_i}\nshowpage\n"
File.open(psfile, "w") { |f| f.print file_contents }
FileUtils.touch(psfile)
end

so, ruby is correct, you are modifying a string while in an in-progress
scan
block. easy-cheasy.

kind regards.

-a


Thanks. I ended the scan block before doing any file writing. That
seemed to do the trick. It still confuses me, though, because, this code
was borrowed from an existing script that I've been using for 6 months,
and, that part of it is just as you see it above.
 
A

ara.t.howard

unknown said:
-a

Sorry. It's a bit much. That's why I was holding back. Here's the whole
script.

Dir.glob("*.ps").each do |psfile|
file_contents = File.read(psfile)
file_contents.scan(/\%\%Pages: (\d{1,5})[ ]+\n/) do
totalpages = $1
if (totalpages.to_i % 2) !=0 then
newtotalpages = totalpages.to_i + 1
file_contents << "\%\%Blank page for Asura.\n\%\%Page:
^^
^^
^^
^^
the modification is question
#{newtotalpages.to_i}\nshowpage\n"
File.open(psfile, "w") { |f| f.print file_contents }
FileUtils.touch(psfile)
end

so, ruby is correct, you are modifying a string while in an in-progress
scan
block. easy-cheasy.

kind regards.

-a


Thanks. I ended the scan block before doing any file writing. That
seemed to do the trick. It still confuses me, though, because, this code
was borrowed from an existing script that I've been using for 6 months,
and, that part of it is just as you see it above.

probably because totalpages is always 1 - it's never even - in your new script
the number of pages is always 2 (or 0) i'm guessing, and so the bug is
triggered. if i we're you i'd update the other script - it's a bug in
waiting.

regards.

-a
 
P

Peter Bailey

unknown said:
probably because totalpages is always 1 - it's never even - in your new
script
the number of pages is always 2 (or 0) i'm guessing, and so the bug is
triggered. if i we're you i'd update the other script - it's a bug in
waiting.

regards.

-a

Well, I know that they're not always odd or even. They've been a mix of
both. But, I understand what you're saying. I will change my original
script. Basically, and, please tell me if I understand this correctly:
if I'm going to do a scan of a file, open the file, scan it, and then
close it. Right?
 
A

ara.t.howard

Well, I know that they're not always odd or even. They've been a mix of
both. But, I understand what you're saying. I will change my original
script. Basically, and, please tell me if I understand this correctly:
if I'm going to do a scan of a file, open the file, scan it, and then
close it. Right?

yup. just remember to avoid this

string = 'foobar'

string.scan(%r/foo/) do |word|
string << 'foo' # can't modify while scanning
end

regards.

-a
 
P

Peter Bailey

unknown said:
yup. just remember to avoid this

string = 'foobar'

string.scan(%r/foo/) do |word|
string << 'foo' # can't modify while scanning
end

regards.

-a

Thanks a lot, -a! I've cleaned up my code. But, if you notice way above,
I've got a File.read in the line before the file scan. If I do an "end"
for the file scan, my "read" is still open, right? Meaning, I can still
do stuff to the open file.
 
R

Robert Klemme

Peter Bailey said:
Thanks a lot, -a! I've cleaned up my code. But, if you notice way
above, I've got a File.read in the line before the file scan. If I do
an "end" for the file scan, my "read" is still open, right? Meaning,
I can still do stuff to the open file.

If you're referring to your original code, then no. You use File.read(name)
which returns the whole file in a single string. No open connection is
returned.

Btw, for efficiency reasons if your files are large you might consider using

File.foreach(file_name) do |line|
....
end

Or use File.readlines instead of File.read - that way you get an array with
lines and not the whole file in one piece.

Kind regards

robert
 
P

Peter Bailey

Robert said:
If you're referring to your original code, then no. You use
File.read(name)
which returns the whole file in a single string. No open connection is
returned.

Btw, for efficiency reasons if your files are large you might consider
using

File.foreach(file_name) do |line|
....
end

Or use File.readlines instead of File.read - that way you get an array
with
lines and not the whole file in one piece.

Kind regards

robert


Thanks, Robert. I'll look into that line-by-line technique. The reason I
probably haven't used it is that I often need to search for or
accommodate data that spans over multiple lines.
 
R

Robert Klemme

Peter Bailey said:
Thanks, Robert. I'll look into that line-by-line technique. The
reason I probably haven't used it is that I often need to search for
or accommodate data that spans over multiple lines.

Yeah, in that case File.read is clearly superior (if the file fits into
memory that is). For me line by line is the default because it scales better
and I switch only to slurp in at once if I need line spanning. But then
again my typical problem might be different from yours so your different
default might actually be the better solution for you.

Kind regards

robert
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,774
Messages
2,569,598
Members
45,152
Latest member
LorettaGur
Top