fsync on stdout for mod_rewrite

E

Eric Anderson

I have a script that I want to ensure has flushed stdout after ever line
of output. I have $stdout.sync=true but when I tried to do $stdout.fsync
I get an Invalid Argument error. Not what I expect according to the docs.

My goal is that I have a script providing lookups for mod_rewrite in
Apache. It hands me the HTTP_HOST header on $stdin and I return the path
where it should look for a specific website on $stdout. It seems to work
well but every now and then it returns the wrong answer. My only two
possibilities that I can see for this problems are:
1) Apache is asking the script the next request before the first request
is answered. But I am using the RewriteLock directive so Apache should
have the requests synchronized.
2) My other option is that the buffer is not getting flushed. Then
another request comes it and both answers are outputted with the first
answer being the wrong answer.

The script seems to perform perfect when run from the command line. Just
not in Apache. Sometime is returns the right answer sometimes it returns
the wrong answer. Even when make the same request (i.e. refresh).

See http://httpd.apache.org/docs-2.0/mod/mod_rewrite.html if you are
curious on the mod_rewrite semantics. If posting the script would help I
can post that also.

I appreciate any pointers.

Eric
 
G

Glenn Parker

Eric said:
I have a script that I want to ensure has flushed stdout after ever line
of output. I have $stdout.sync=true but when I tried to do $stdout.fsync
I get an Invalid Argument error. Not what I expect according to the docs.

fsync is only meaningful for files in a file system. You probably want
to use $stdout.flush, but if you set $stdout.sync = true then
$stdout.flush is already being done implicitly.
If posting the script would help I
can post that also.

Maybe describing a little bit more about the connection between Apache
and your script would help, too. Is a new script process created for
every URL? If so, then this is likely not a buffering issue. If not,
then you might be failing to reset some lingering state in the Ruby
interpreter between "calls" from Apache.
 
E

Eric Anderson

Glenn said:
fsync is only meaningful for files in a file system. You probably want
to use $stdout.flush, but if you set $stdout.sync = true then
$stdout.flush is already being done implicitly.

I first had $stdout.flush. Then I switched to sync when I found out
about it. But according to the docs the operating system might no flush
it. I wasn't sure if it applied to $stdout but I wanted to issue the
command just in case.
Maybe describing a little bit more about the connection between Apache
and your script would help, too. Is a new script process created for
every URL? If so, then this is likely not a buffering issue. If not,
then you might be failing to reset some lingering state in the Ruby
interpreter between "calls" from Apache.

I have looked through the script 100 times and from I can tell
everything should be ok for each request. The protocol between Apache
and the script is:

Apache send the key followed by a newline character to the stdin of the
script. In this case I am telling Apache to send the HTTP Host header.
My script then does some logic to determine what to return. It usually
is a path to the website being requested. The script is started when
Apache starts up and continues to run as long as Apache runs. The path
returned is followed by a newline character (the spec says you can also
send it the characters NULL). The docs give it an example in perl with
the following script:

My script is as follows. Any suggestions or improve or fix the problem
are greatly appreciated. You can probably guess the structure of the
database from the SQL statement. The idea is that it is supposed to
return the name that is the closest to what was passed in. So if you get
a request for foo.bar.com and you have a site called bar.com it returns
bar.com. If you get a request from www.baz.bar.com and you have a site
called baz.bar.com it will return baz.bar.com instead of just bar.com.
There are a couple exceptions. For example account.<any domain>.com
should return account.realsimplehosting.com. Also anything with svn in
it should just return domain name and not the entire path. Also there is
an effort to retry requests after failure. After 10 failures it will
just return error.realsimplehosting.com. The script is attached to this
email. I attached it so the lines won't get wrapped.

#!/usr/bin/ruby

require 'dbi'

# CONSTANTS
RSH_VIRTUAL = 1
RSH_WEB_PATH = "/virtual/#{RSH_VIRTUAL}/web/"
NOT_FOUND_PATH = "#{RSH_WEB_PATH}notfound.realsimplehosting.com"
ERROR_PATH = "#{RSH_WEB_PATH}error.realsimplehosting.com"
PORTAL_PATH = "#{RSH_WEB_PATH}?.realsimplehosting.com"
PORTAL_SPECIFIERS = ['webreports', 'database', 'email', 'account', 'cp']

# Put these in the global context
$database = nil
$stmt = nil

# For caching the most recent answer. We may start caching more than this so
# that popular websites get a quicker response but I figured at a min we need
# to cache the most recent since each request calls this lookup several times.
last_request = '';
last_answer = '';

# To protect against infinate retries
loop_protect = 10

def init_database
$database = DBI.connect 'DBI:Mysql:rsh', 'username', 'password'
sql = <<-SQL
SELECT
accounts.id AS account,
primary_domain.name AS path,
LENGTH( aliases.name ) AS size
FROM
accounts INNER JOIN
sites
ON accounts.id = sites.account INNER JOIN
domains AS primary_domain
ON sites.primary_domain = primary_domain.id
INNER JOIN
domains AS aliases
ON sites.id = aliases.site
WHERE
(? LIKE CONCAT( '%', aliases.name )) AND
((sites.expire_date IS NULL) OR
(sites.expire_date <= CURRENT_DATE()))
ORDER BY size DESC
SQL
$stmt = $database.prepare sql
end

init_database
$stdout.sync = true
$stdin.each do | request |
request.chomp!
catch :)done) do
if request == last_request
throw :done
end
last_request = request
last_answer = nil

# Does DBI think we are good to go?
init_database unless !$database.nil? && $database.connected? &&
$database.ping

begin
$stmt.execute request
if $stmt.rows == 0
$stmt.cancel
last_answer = NOT_FOUND_PATH
throw :done
end
row = $stmt.fetch
if request =~ /svn/i
last_answer = row[1]
$stmt.cancel
throw :done
end
PORTAL_SPECIFIERS.each do |key|
dev_key = "#{key}-dev"
re = Regexp.new key, true
dev_re = Regexp.new dev_key, true
last_answer = PORTAL_PATH.sub '?', key if
request =~ re
last_answer = PORTAL_PATH.sub '?', dev_key if
request =~ dev_re
end
last_answer = "/virtual/#{row[0]}/web/#{row[1]}" if
last_answer.nil? or request =~ /bypass/
$stmt.cancel
rescue # In case the library gives us an error
# Try reiniting the connection a couple of times.
# Then just start directing to the error site
unless loop_protect
last_request = ''
last_answer = ERROR_PATH
end
loop_protect -= 1 unless loop_protect == 0
init_database
retry
end
end
puts last_answer
end
 
G

Glenn Parker

Eric said:
My script is as follows.

I'm not too familiar with embedding SQL in Ruby, so I may not be much
help here, but my spidey-sense tingles when I look at the way the
database connection is manipulated here.

I would try two debug modes, one where the database connection closed
and re-opened on every Apache request, and another mode where the script
is restarted for every Apache request. It might help narrow down the
problem to eliminate lingering state in the continuous connection as a
culprit.
 
D

Dominik Werder

clear last_request after the puts so that not accidentially an old value
can be given to apache.. maybe overcautious, anyway.. :)
bye!
Dominik
 
E

Eric Anderson

Dominik said:
clear last_request after the puts so that not accidentially an old
value can be given to apache.. maybe overcautious, anyway.. :)

last_request stores the last request for the next request. It is a basic
caching mechanism. For every web request this script is called a couple
of times because of the way mod_rewrite works. Therefore I want to cache
the most recent answer so I don't end up with a bunch of SQL queries for
each request. Also if a site is popular it will cut down on the number
of SQL requests although a more extensive caching would be better for
that. If last_request == request then we give the same answer as the
last request.

Eric
 
E

Eric Anderson

Glenn said:
I'm not too familiar with embedding SQL in Ruby, so I may not be much
help here, but my spidey-sense tingles when I look at the way the
database connection is manipulated here.

I'm just using straight ruby. The SQL is just a Ruby here-doc. The
database connection is just created once unless something goes wrong. If
something goes wrong it tries to re-init the connection. After 10 times
it gives up.
I would try two debug modes, one where the database connection closed
and re-opened on every Apache request, and another mode where the script
is restarted for every Apache request. It might help narrow down the
problem to eliminate lingering state in the continuous connection as a
culprit.

I'll give that a try, but I don't think it is a state issue. If things
are getting flushed properly then I am leaning towards some kind of
concurrency problem even though I am using the rewrite lock. I need to
see if am using the prefork model in Apache or some threaded model.
Maybe if it is using the threaded model that is causing some problem in
relation to ruby even though the script is a separate process.

Thanks for your suggestions,

Eric
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,537
Members
45,022
Latest member
MaybelleMa

Latest Threads

Top