Logger rotation problem in multi-process/Rails environment

G

Guy Rosen

Hi,

There seems to be a bug (or just unexpected behavior...) in logger.rb's
rotation that causes it to either misbehave or fail the application
when running in a multi-process environment - specifically Rails.

I have Ruby on Rails under FastCGI, with multiple dispatch.fcgi
processes alive. My logger is configured for daily rotation. The
problematic scenario is this:
- Come midnight, the _first_ Rails process that happens to get a
request renames the log file (production.log =>
production.log.yyyymmdd)
- Subsequently, the remaining Rails processes attempt to rename the
file as well, but fail completely because production.log.yyyymmdd
already exists. Logger raises an exception, and users receive "500
Internal Server Error".
- Until all Rails processes are restarted, only the first one will
work and all the rest will continue to fail violently.

I'm including a patch for logger.rb that fixes Logger#shift_log_period
so that it doesn't fail violently if log.yyyymmdd already exists, but
rather just reopens the logfile and keeps on working.

Note there is probably a bug when rotating by size
(Logger#shift_log_age) in the same environment (haven't tested it, but
reading the code seems to indicate it will happen). Although nothing
will fail violently, each Rails process, ignorant of its peers, will
rotate the logs and the end result will be multiple rotations!

- Guy.


-----8<-----

Index: logger.rb
===================================================================
RCS file: /src/ruby/lib/logger.rb,v
retrieving revision 1.5.2.9
diff -r1.5.2.9 logger.rb
568,570d567
< if FileTest.exist?(age_file)
< raise RuntimeError.new("'#{ age_file }' already exists.")
< end
572c569,571
< File.rename("#{@filename}", age_file)
---
 
A

Alex Young

Guy Rosen wrote:
Note there is probably a bug when rotating by size
(Logger#shift_log_age) in the same environment (haven't tested it, but
reading the code seems to indicate it will happen). Although nothing
will fail violently, each Rails process, ignorant of its peers, will
rotate the logs and the end result will be multiple rotations!
There's definitely something screwy going on. Try this code:

require 'logger'

logger = Logger.new('test.log', 100, 70 * 1024)

100.times do
fork {
1000.times do
logger.info("This is a test line.")
end
}
end

I know, 100 is a ridiculous number. It runs, though. It should result
in 100 full logfiles. However, running this on my machine (Ubuntu 6.06,
ruby 1.8.4) results in Logger::ShiftingErrors being thrown, and the
logfile list when it's finished is incomplete (the last run only left 84
logfiles, with apparently random gaps in the sequence), so there's been
data loss - I presume from when the processes fell over having failed to
log.
 
G

Guy Rosen

The solution in your rotate-by-size configuration is probably just a
line or two in shift_log_age that checks whether the file *on disk* has
already been rotated (by checking its size), and if so just reopen it
instead of doing all the renames.

Although, I should say that both this and my patch to shift_log_period
have a possible race condition in the "check then rotate" flow. Since
it's not atomic, some kind of basic IPC is needed to synchronize that
bit of code.

- Guy.
 
A

ara.t.howard

-----8<-----

Index: logger.rb
===================================================================
RCS file: /src/ruby/lib/logger.rb,v
retrieving revision 1.5.2.9
diff -r1.5.2.9 logger.rb
568,570d567
< if FileTest.exist?(age_file)
< raise RuntimeError.new("'#{ age_file }' already exists.")
< end
572c569,571
< File.rename("#{@filename}", age_file)

this code has a serious race condition too

if not FileTest.exist?(age_file)
#
# but suddenly here it does!!
#
File.rename("#{@filename}", age_file)
end

this will be especially bad on nfs and shared filesystems where the exists?
check only reads the inode and inode caching may mean a lag of 1-3 minutes
(with normal setups) before the check reflects what's on the remote server.
invalidating an inode is tricky and platform/fs dependant as well.

additionally fixing this doesn't make logger process safe: writes will
sometimes still be overlapped because logger doesn't use any sort of ipc safe
locking for writing. if it did it would incur a massive speed penalty to boot
- esp if locking is on a shared filesystem. also, creation of a new logfile
uses (File::WRONLY | File::APPEND | File::CREAT) which also races on any
shared fs... in summary, there's almost nothing about logger.rb which makes
it process safe ;-(


-a
 
A

Alex Young

this code has a serious race condition too

if not FileTest.exist?(age_file)
#
# but suddenly here it does!!
#
File.rename("#{@filename}", age_file)
end

this will be especially bad on nfs and shared filesystems where the exists?
check only reads the inode and inode caching may mean a lag of 1-3 minutes
(with normal setups) before the check reflects what's on the remote server.
invalidating an inode is tricky and platform/fs dependant as well.

additionally fixing this doesn't make logger process safe: writes will
sometimes still be overlapped because logger doesn't use any sort of ipc
safe
locking for writing. if it did it would incur a massive speed penalty
to boot
- esp if locking is on a shared filesystem. also, creation of a new
logfile
uses (File::WRONLY | File::APPEND | File::CREAT) which also races on any
shared fs... in summary, there's almost nothing about logger.rb which makes
it process safe ;-(

It sounds like you're confirming my suspicion - that the only way to get
process-safe logging is by having a separate logging process which
everything else can connect to over some suitable RPC protocol... True?
 
A

ara.t.howard

It sounds like you're confirming my suspicion - that the only way to get
process-safe logging is by having a separate logging process which
everything else can connect to over some suitable RPC protocol... True?

no. you could use locking of an external file (otherwise creation still has a
race). but that solution would certainly work too...



-a
 
A

Alex Young

no. you could use locking of an external file (otherwise creation still
has a
race). but that solution would certainly work too...
For my application, I'm looking at being able to scale to more than one
machine, so it's at least worth looking into - and I'd really rather
avoid learning how to set up NFS for the moment if I can :)
 
A

ara.t.howard

For my application, I'm looking at being able to scale to more than one
machine, so it's at least worth looking into - and I'd really rather avoid
learning how to set up NFS for the moment if I can :)

check out log4r - i think it already supports a drb based logger.

cheers.

-a
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top