Strange logger rotation failure

Discussion in 'Ruby' started by Daniel Berger, Jul 11, 2005.

  1. Hi all,

    Ruby 1.8.2
    Solaris 10

    I have a couple Ruby scripts running via cron that connect to an Oracle
    database and execute some stored procedures - nothing fancy. On
    Saturday, July 2, 2005, the logger package spewed a strange error at me:

    Your "cron" job on itwsp193910wss
    log_cleanup_dev.rb

    produced the following output:

    /usr/local/lib/ruby/1.8/logger.rb:516:in `write': Shifting failed.
    'log_cleanup_dev.log.20050630' already exists. (Logger::ShiftingError)
    from /usr/local/lib/ruby/1.8/logger.rb:318:in `add'
    from /usr/local/lib/ruby/1.8/logger.rb:399:in `fatal'
    from log_cleanup_dev.rb:68

    First, a bit of background. There are two nearly identical scripts that
    run - one is log_cleanup.rb and the other is log_cleanup_dev.rb. The
    only difference between them is the database connect string (i.e. one
    connects to a production database, and the script in question connects
    to a dev database).

    The first curious thing to me is that this error was raised on July
    *2nd* (it also happened on July 8th, btw). The second (less) curious
    thing is that this was only raised by the dev script. It's less
    curious, because the reason the script failed in both instances was
    because the DBA was doing some maintenance on the dev database on those
    mornings.

    # Relevant snippet from log_cleanup_dev.rb (line 68 noted):

    # Logger instance
    log = Logger.new("log_cleanup_dev.log", "monthly")

    # and later...
    begin
    sth = dbh.prepare("begin my_proc; end;")
    sth.execute
    rescue DBI::Error => e
    msg = "Error executing stored procedure 'my_proc': #{e.errstr}"
    log.fatal{ msg } # Line 68
    raise
    rescue => e
    msg = "Unexpected error executing stored procedure 'my_proc': #{e}"
    log.fatal{ msg }
    raise
    ensure
    sth.finish if sth
    end

    Any ideas? It looks to me like it's trying to rotate when it shouldn't
    or something, but I haven't dug too deeply into this yet.

    Regards,

    Dan
    Daniel Berger, Jul 11, 2005
    #1
    1. Advertising

  2. Daniel Berger

    Ara.T.Howard Guest

    On Tue, 12 Jul 2005, Daniel Berger wrote:

    > Hi all,
    >
    > Ruby 1.8.2
    > Solaris 10
    >
    > I have a couple Ruby scripts running via cron that connect to an Oracle
    > database and execute some stored procedures - nothing fancy. On Saturday,
    > July 2, 2005, the logger package spewed a strange error at me:
    >
    > Your "cron" job on itwsp193910wss
    > log_cleanup_dev.rb
    >
    > produced the following output:
    >
    > /usr/local/lib/ruby/1.8/logger.rb:516:in `write': Shifting failed.
    > 'log_cleanup_dev.log.20050630' already exists. (Logger::ShiftingError)
    > from /usr/local/lib/ruby/1.8/logger.rb:318:in `add'
    > from /usr/local/lib/ruby/1.8/logger.rb:399:in `fatal'
    > from log_cleanup_dev.rb:68
    >
    > First, a bit of background. There are two nearly identical scripts that run
    > - one is log_cleanup.rb and the other is log_cleanup_dev.rb. The only
    > difference between them is the database connect string (i.e. one connects to
    > a production database, and the script in question connects to a dev
    > database).
    >
    > The first curious thing to me is that this error was raised on July *2nd* (it
    > also happened on July 8th, btw). The second (less) curious thing is that
    > this was only raised by the dev script. It's less curious, because the
    > reason the script failed in both instances was because the DBA was doing some
    > maintenance on the dev database on those mornings.
    >
    > # Relevant snippet from log_cleanup_dev.rb (line 68 noted):
    >
    > # Logger instance
    > log = Logger.new("log_cleanup_dev.log", "monthly")
    >
    > # and later...
    > begin
    > sth = dbh.prepare("begin my_proc; end;")
    > sth.execute
    > rescue DBI::Error => e
    > msg = "Error executing stored procedure 'my_proc': #{e.errstr}"
    > log.fatal{ msg } # Line 68
    > raise
    > rescue => e
    > msg = "Unexpected error executing stored procedure 'my_proc': #{e}"
    > log.fatal{ msg }
    > raise
    > ensure
    > sth.finish if sth
    > end
    >
    > Any ideas? It looks to me like it's trying to rotate when it shouldn't or
    > something, but I haven't dug too deeply into this yet.
    >
    > Regards,
    >
    > Dan


    probably what happened is that

    first cron starts, then hangs since db is being maintained

    second cron job starts, connects, begins logging and rolls

    first cron jobs finally blows up/times out. raises error, attempts to roll
    log, and then fails to roll it since second already did.


    i've seen things like this many, many times in our production system whe
    things are cron'd. basically cron jobs can stack if one or more hangs and
    code rarely deals with this gracfully. of course, this may not be your issue
    but looking at timestamps in the log file in question, the mtime of the log
    file itself, and the cron interval should give you a clue. if it's cron'd
    with any frequency i'd put money on this.

    the way i've worked around this for some of my stuff, like rq and dirwatch, is
    to make the code run as a daemon. the daemon aquires a lockfile and refuses
    to start if one is already running. then i simply attempt to start the daemon
    via cron every fifteen minutes. that one only one instance ever runs at a
    time, which greatly simplifies making robust systems. it's also nice in the
    case of things which are supposed to run at, say, 7pm - but don't since the
    system crashed at 6:59. the cron would simply restart the daemon when the
    machine comes up and then whatever work gets done gets done at 7:01 or
    whatever.

    another thought is nfs. mtime cannot be trusted on nfs unless you sysads are
    completely stellar and totally understand nfs setups wrt inode caching -
    which, in my experience, is extremely rare.

    for what it's worth i use the builtin logger in around 200 production
    applications and have never seen an error - even on nfs - so i'm guessing
    the logger code is o.k. then again i don't use monthly rolling and don't
    exercise that code. i looked at the monthly code - it look fine to me
    but...


    hth and good luck.

    -a
    --
    ===============================================================================
    | email :: ara [dot] t [dot] howard [at] noaa [dot] gov
    | phone :: 303.497.6469
    | My religion is very simple. My religion is kindness.
    | --Tenzin Gyatso
    ===============================================================================
    Ara.T.Howard, Jul 11, 2005
    #2
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Ilias Lazaridis
    Replies:
    0
    Views:
    569
    Ilias Lazaridis
    Feb 1, 2005
  2. sachin
    Replies:
    1
    Views:
    963
    Soren Kuula
    Feb 3, 2006
  3. cap
    Replies:
    3
    Views:
    282
    James Edward Gray II
    Dec 11, 2005
  4. Georges Ko
    Replies:
    4
    Views:
    266
    Georges Ko
    Jul 26, 2006
  5. Guy Rosen
    Replies:
    7
    Views:
    285
Loading...

Share This Page