CGI uses file size to distinguish between regular values and files

  • Thread starter David Heinemeier Hansson
  • Start date
D

David Heinemeier Hansson

I've been having a ton of problems handling file uploads with CGI.rb
and after diving into the code, it looks like my woes can be narrowed
down to these lines in the read_multipart method:

if 10240 < content_length
require "tempfile"
body = Tempfile.new("CGI")
else
begin
require "stringio"
body = StringIO.new
rescue LoadError
require "tempfile"
body = Tempfile.new("CGI")
end
end

It sure looks like the sole differentiator on whether a request should
be treated with StringIOs or Tempfiles is the size of the request. If
this understanding is correct, I have no trouble understanding why I've
been pulling out my hair in frustration the last couple or many hours.

If I upload just one file that brings the content_length above 10240,
ALL fields in the request is treated as Tempfiles. If I'm uploading a
small file that keeps the content_length below 10240, all the fields --
including the file -- is treated as StringIOs.

For starters, how would one move a small file, treated as StringIO, to
another location? With Tempfile, you just do Tempfile#local_path, and
move away. But it's not so straight forward with StringIO.

It also seems very tedious that the application has to know whether the
file upload was big or small and act accordingly.

Maybe I'm just missing something. Like that StringIO and Tempfile is
supposed to act the same and they just don't on my system for some
reason. Or this is a bug.

Since my last cries for help haven't been overly succesful, I gather
that most consider CGI.rb to be a black box as well. So perhaps I
should get in touch with the original author. Does anyone know if Wakou
Aoyama is still actively maintaining CGI.rb?
 
K

Kirk Haines

Does anyone know if there is a DBI driver for the Firebird database, or
even a non-DBI interface to the database available someplace? I'm curious
about testing Firebird for some client applications in my development
environment.


Thanks,

Kirk Haines
 
A

ara howard

David Heinemeier Hansson said:
I've been having a ton of problems handling file uploads with CGI.rb
and after diving into the code, it looks like my woes can be narrowed
down to these lines in the read_multipart method:

if 10240 < content_length
require "tempfile"
body = Tempfile.new("CGI")
else
begin
require "stringio"
body = StringIO.new
rescue LoadError
require "tempfile"
body = Tempfile.new("CGI")
end
end

It sure looks like the sole differentiator on whether a request should
be treated with StringIOs or Tempfiles is the size of the request. If
this understanding is correct, I have no trouble understanding why I've
been pulling out my hair in frustration the last couple or many hours.

what's the frustration:

value = cgi['key']

# handle all possible cases, String, StringIO, Tempfile
value = value.read if value.respond_to? :read

If I upload just one file that brings the content_length above 10240,
ALL fields in the request is treated as Tempfiles. If I'm uploading a
small file that keeps the content_length below 10240, all the fields --
including the file -- is treated as StringIOs.

For starters, how would one move a small file, treated as StringIO, to
another location? With Tempfile, you just do Tempfile#local_path, and
move away.

# after above technique
File.open(dest){|d| d.write value.read}
But it's not so straight forward with StringIO.

it could be the same for both cases.
It also seems very tedious that the application has to know whether the
file upload was big or small and act accordingly.

it may seem tedious, but consider the case where your cgi program is
actually used by say, 10000 people at once. the way the http protocal
is, your program will have all form input uploaded as multipart even
if only one of them requires
it on the client end - this is not up to ruby or cgi.rb. now cgi.rb
has two choices:

a) make all multipart form objects objects in memory - potentially
clobbering the web servers free memeory if many requests come in - say
by making them all Strings

b) make all multipart form objects entries in the file system
thereby reducing memory usage at the expensive of io (fast and big vs.
slow and small) and, as a special case, make small form objects file
'like' objects so at least all your cgi parms will be able to be
'read'

now b is obviously preferred over a when the uploads are 'big'. cgi
seems to think 'big' is about 10240. sounds big to me - but i suppose
this is a bit arbitrary. then again ruby would allow one to change
this quite easily if one really cared...
Maybe I'm just missing something. Like that StringIO and Tempfile is
supposed to act the same and they just don't on my system for some
reason. Or this is a bug.

they are the same in that you can 'read' both of them. this should be
the same for all systems. the fact of the matter is that multipart
form values ARE differenct than non-multipart ones in the way they are
uploaded to the server - so it seems to make some sense that you'd
have different methods for the two catagories. i suppose one COULD
munge every thing into a string, but this does not allow logic like
this without clobbering memory (see above:

filename = cgi['filename']
mp3 = cgi['mp3'] # yikes if this is a String !!

filename = filename.read # probably a nice, small, StringIO

# handle small mp3s quickly and large ones without clobbering the
CPU
case mp3
when Tempfile
File.cp mp3.local_path, filename
when StringIO
File.open(filename){|f| f.write mp3.read}
end

not that one should ever write such a program ;-)
Since my last cries for help haven't been overly succesful, I gather
that most consider CGI.rb to be a black box as well. So perhaps I
should get in touch with the original author. Does anyone know if Wakou
Aoyama is still actively maintaining CGI.rb?

i use cgi.rb alot, combined with amrita, the session classes, and the
pstore session manager i wrote to enable one to use pstore as a
backend, thereby enabling storage of arbitrary (not just string)
objects as session data. i've found the combination to be really
powerful and flexible.

hope this helps.

-a
 
S

Simon Kitching

Hi David,

I haven't used Ruby's file upload facility, but since no-one else has
replied I'll chip in with my $0.02..

I suspect that the fact that tempfiles are used by this module is a
private implementation detail, ie something that users are not supposed
to be aware of, or depend upon.

Are you sure there isn't a public API provided to access the
uploaded/downloaded data "correctly", and that you aren't sneaking
around the back way by trying to directly access the temp file created?

If your code depends upon "private" implementation details of any
library, then you are asking for trouble as the implementer is perfectly
within their rights to change private implementation details in any
point release; only "public" APIs can be relied upon to remain stable
across releases. Yes, you may be able to optimise your code by accessing
private APIs of a module, but then you can't expect the module author to
care if your code breaks.

The kind of optimisation you describe is fairly common when dealing with
data flowing across networks; process it in-memory if it is small else
use the filesystem as a data cache. And this is a *private*
implementation detail that should not be exposed to the user of the
module.

Of course I haven't looked at CGI.rb's interface and may be off-track
here. And because Ruby's documentation standards are generally poor, it
may not immediately be obvious which are the public APIs and which the
private ones if the author has been a little slack about using the
public/protected/private keywoards...

Regards,

Simon
 
D

Dmitry Borodaenko

I suspect that the fact that tempfiles are used by this module is a
private implementation detail, ie something that users are not
supposed to be aware of, or depend upon.

Why, you can always check if the parameter you've got _is_ a Tempfile,
and, if it is not, assume that it is small enough to be handled in core.

Adopted from Samizdat source:

if Tempfile === file then
File.syscopy(file.path, upload)
else # StringIO
File.open(upload, 'w') {|f| f.write(file.read) }
end

Did I miss something?
 
D

Dmitry Borodaenko

Why should I have to make that distinction in *my* code? I've got a
Perl app that I plan to port over to Ruby at some point, and it allows
file uploads (up to a couple megabytes in size). Indeed, the way that
I deal with the file uploads never touches the filesystem from the
perspective of my application (the file "string" goes directly into a
MySQL database). If I have to step into my own filehandling, then I
may not port this app to Ruby even though it's a fine candidate for it
otherwise.

You've missed the question in the beginning of this thread:

For starters, how would one move a small file, treated as StringIO,
to another location? With Tempfile, you just do Tempfile#local_path,
and move away. But it's not so straight forward with StringIO.

Obviously, asker _wants_ his own file handling.

Since I do that in Samizdat, I know at least one reason for that: I
don't want to add two extra layers of indirection (Ruby and SQL) between
Apache (or rsync or Gnutella) and a hundred-megabyte video file on disk.
The library should provide me the raw data. If I decide something is a file
and want to take some shortcuts with it, then the library can provide me
additional information (e.g., CGI.tempfile?(parameter_index)) and
functionality to access that implementation detail, but I should never
*have* to make the distinction in my application code. Ever.

YMMV. I'd rather learn the difference between StringIO and Tempfile
_once_, than learn special API for handling this difference in each new
library I use. And introspection API is in Ruby for a reason, too ;)

Not that I object to this behaviour of CGI module being more explicitly
documented at least in comments in cgi.rb.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,483
Members
44,902
Latest member
Elena68X5

Latest Threads

Top