Quick CGI question (specific to the CGI package)

Discussion in 'Perl Misc' started by Ted Byers, Nov 25, 2009.

  1. Ted Byers

    Ted Byers Guest

    I am using CGI, and have been able to do most of the things I need to
    do, until now.

    With package CGI (and my question is specific to what is in that
    package and what I might have to do beyond what it intrinsicly
    supports), the documentation beginning with the title "CREATING A
    STANDARD HTTP HEADER" gives, among other examples, the following
    example:

    print header('image/gif');

    From this I believed that I could write something like:

    print $query->header('video/$format');
    open(FIN,"<","$fname");
    binmode(FIN);
    binmode STDOUT;
    my $fcontent;
    read FIN, $fcontent, $flength;
    print $fcontent;

    Is this appropriate, (I have seen equivalent code on examples on the
    web), or is there a way to just write the header first and then send
    whatever file has the content without opening it and writing it out in
    binary mode within my own code?

    This works adequately if you provide something like $format='avi',
    $fname to the name of whatever avi file you have, and $flength to the
    size of that file.

    In fact, it works OK when my browser (firefox) asks what program to
    use to view the content because it doesn't know what to do with a file
    with an extension of cgi. With the above code, my browser invariably
    asked me what to use to view the file, and the file name it gave was
    the name of the cgi script. If I told it to use Windows Media Player,
    it played the content as desired.

    In actuality, my script makes a video file based on request
    parameters, and puts the content into a file with a name like
    result.avi (or asf, or mpg, depending on the format of the component
    clips).

    The only way I found to get this cgi script to work as I expected was
    to use redirection instead of just writing the content of the file in
    binary mode. In other words, the following two lines (with NOTHING
    else written to standard out) work as I expected.

    my $url = "http://localhost:9080/videos/$fname";
    print $query->redirect("$url",-status=>303);

    Now, is there a way to tell the client that although the URL requested
    pointed at my cgi script, the name of the file containing the content
    is result.avi? Or do I have to resort to redirection as I have done
    now (pending further insight from CGI experts out there). Or is there
    some other package, other than CGI, that I ought to be examining?

    Thanks

    Ted
    Ted Byers, Nov 25, 2009
    #1
    1. Advertising

  2. Ted Byers

    Uri Guttman Guest

    >>>>> "TB" == Ted Byers <> writes:

    TB> From this I believed that I could write something like:

    TB> print $query->header('video/$format');
    TB> open(FIN,"<","$fname");

    don't quote scalars like that. not needed and could cause a bug down the
    line

    TB> binmode(FIN);
    TB> binmode STDOUT;

    be consistant in your style. why parens on one and not the other? also
    if on a unix platform, binmode won't matter but this makes it portable
    to winblows.

    TB> my $fcontent;
    TB> read FIN, $fcontent, $flength;

    where is $flength set? i assume you would do a -s to get the file size

    TB> print $fcontent;

    if you want more speed, use sysread and syswrite. if you want simpler
    code, use File::Slurp and its read_file and write_file subs.

    TB> Is this appropriate, (I have seen equivalent code on examples on the
    TB> web), or is there a way to just write the header first and then send
    TB> whatever file has the content without opening it and writing it out in
    TB> binary mode within my own code?

    perl has no builtin way to print a file to another handle. there may be
    some OS specific ways to do it but i don't know them.

    uri

    --
    Uri Guttman ------ -------- http://www.sysarch.com --
    ----- Perl Code Review , Architecture, Development, Training, Support ------
    --------- Gourmet Hot Cocoa Mix ---- http://bestfriendscocoa.com ---------
    Uri Guttman, Nov 25, 2009
    #2
    1. Advertising

  3. Ted Byers

    Ted Byers Guest

    On Nov 25, 5:11 pm, "Uri Guttman" <> wrote:
    > >>>>> "TB" == Ted Byers <> writes:

    >
    >   TB> From this I believed that I could write something like:
    >
    >   TB>   print $query->header('video/$format');
    >   TB>   open(FIN,"<","$fname");
    >
    > don't quote scalars like that. not needed and could cause a bug down the
    > line
    >

    OK. NB, though, that code was copied from a very quick and dirty
    script used to test ideas (with code copied from various sources
    including examples on the web).

    >   TB>   binmode(FIN);
    >   TB>   binmode STDOUT;
    >
    > be consistant in your style. why parens on one and not the other? also
    > if on a unix platform, binmode won't matter but this makes it portable
    > to winblows.
    >

    I know. I have to make certain I can run it on any platform the boss
    may bring in, which may well include a Windows server.

    >   TB>   my $fcontent;
    >   TB>   read FIN, $fcontent, $flength;
    >
    > where is $flength set? i assume you would do a -s to get the file size
    >

    You assume correctly. As I said in my remarks, the key variables need
    to be set before the code shown.

    >   TB>   print $fcontent;
    >
    > if you want more speed, use sysread and syswrite. if you want simpler
    > code, use File::Slurp and its read_file and write_file subs.
    >

    Good to know.

    >   TB> Is this appropriate, (I have seen equivalent code on examples on the
    >   TB> web), or is there a way to just write the header first and then send
    >   TB> whatever file has the content without opening it and writing it out in
    >   TB> binary mode within my own code?
    >
    > perl has no builtin way to print a file to another handle. there may be
    > some OS specific ways to do it but i don't know them.
    >

    I would have been cleaning up the various things you mentioned as I
    refined the program to be ready to deploy.

    But the key problem remains. In my testing, the client browser thinks
    the video file content has the cgi script as the file name. Did I
    misunderstand what the CGI package documentation showed, or did I miss
    something in that package that would tell the browser that the content
    sent after the header is a video file? Does the CGI package have a
    function that is used after the header to tell the script to send a
    given file? The CGI package is huge and I may well have missed
    something in it that relates to this problem. Or is there another
    package that can be used with the CGI package to facilitate sending a
    video file (or any other MIME type)? Or do I have to rely entirely on
    redirection? Without the redirection, the browser seemed to be
    deciding what to do based on the CGI script name rather than the
    content type header (unless the header functionin the CGI package
    doesn't do what the documentation implies).

    Thanks

    Ted
    Ted Byers, Nov 25, 2009
    #3
  4. Ted Byers

    Uri Guttman Guest

    >>>>> "TB" == Ted Byers <> writes:

    TB> On Nov 25, 5:11 pm, "Uri Guttman" <> wrote:
    >> >>>>> "TB" == Ted Byers <> writes:

    >>
    >>   TB> From this I believed that I could write something like:
    >>
    >>   TB>   print $query->header('video/$format');


    as tad pointed out, that will not generate the right header. use double
    quotes and retest it.

    TB> But the key problem remains. In my testing, the client browser
    TB> thinks the video file content has the cgi script as the file name.
    TB> Did I misunderstand what the CGI package documentation showed, or
    TB> did I miss something in that package that would tell the browser
    TB> that the content sent after the header is a video file? Does the
    TB> CGI package have a function that is used after the header to tell
    TB> the script to send a given file? The CGI package is huge and I
    TB> may well have missed something in it that relates to this problem.
    TB> Or is there another package that can be used with the CGI package
    TB> to facilitate sending a video file (or any other MIME type)? Or
    TB> do I have to rely entirely on redirection? Without the
    TB> redirection, the browser seemed to be deciding what to do based on
    TB> the CGI script name rather than the content type header (unless
    TB> the header functionin the CGI package doesn't do what the
    TB> documentation implies).

    that is probably because it doesn't recognize video/$format as a known
    type. fix the quotes bug and see what happens.

    uri

    --
    Uri Guttman ------ -------- http://www.sysarch.com --
    ----- Perl Code Review , Architecture, Development, Training, Support ------
    --------- Gourmet Hot Cocoa Mix ---- http://bestfriendscocoa.com ---------
    Uri Guttman, Nov 25, 2009
    #4
  5. Ted Byers

    Ted Byers Guest

    On Nov 25, 6:11 pm, Tad McClellan <> wrote:
    > Ted Byers <> wrote:
    > >   print $query->header('video/$format');

    >
    >                          ^             ^
    >                          ^             ^
    >
    > Single quotes do not interpolate...
    >
    > > This works adequately if you provide something like $format='avi',

    >
    > That is simply not possible.
    >
    > If it worked adequately, then it must certainly have NOT been
    > the code you've shown us...
    >

    I now know why it appeared to work adequately. The code I showed
    wrote the contents of the video file to standard out. When it does
    so, and you tell it to use the Windows media Player to view it, the
    palyer displayed the contents of the file anyway. Which led to the
    question I asked. The client received the data I intended to send,
    but didn't know what to do with it unless told to ignore the extension
    on the name of the cgi script that sent it.

    Now I changed the code to use the double quotes, and I then changed
    the surrounding code to display what that line prints, and obtained
    the following:

    Content-Type: video/mpg

    However, when I comment out the text output statements and instead
    write the contents of the video file, the result is the same.

    Any other ideas?

    Thanks

    Ted
    Ted Byers, Nov 26, 2009
    #5
  6. Ted Byers

    Ted Byers Guest

    On Nov 25, 6:16 pm, "Uri Guttman" <> wrote:
    > >>>>> "TB" == Ted Byers <> writes:

    >
    >   TB> On Nov 25, 5:11 pm, "Uri Guttman" <> wrote:
    >   >> >>>>> "TB" == Ted Byers <> writes:
    >   >>
    >   >>   TB> From this I believed that I could write something like:
    >   >>
    >   >>   TB>   print $query->header('video/$format');
    >
    > as tad pointed out, that will not generate the right header. use double
    > quotes and retest it.
    >

    Done, with no change in behaviour/

    >   TB> But the key problem remains.  In my testing, the client browser
    >   TB> thinks the video file content has the cgi script as the file name..
    >   TB> Did I misunderstand what the CGI package documentation showed, or
    >   TB> did I miss something in that package that would tell the browser
    >   TB> that the content sent after the header is a video file?  Does the
    >   TB> CGI package have a function that is used after the header to tell
    >   TB> the script to send a given file?  The CGI package is huge and I
    >   TB> may well have missed something in it that relates to this problem..
    >   TB> Or is there another package that can be used with the CGI package
    >   TB> to facilitate sending a video file (or any other MIME type)?  Or
    >   TB> do I have to rely entirely on redirection?  Without the
    >   TB> redirection, the browser seemed to be deciding what to do based on
    >   TB> the CGI script name rather than the content type header (unless
    >   TB> the header functionin the CGI package doesn't do what the
    >   TB> documentation implies).
    >
    > that is probably because it doesn't recognize video/$format as a known
    > type. fix the quotes bug and see what happens.
    >


    Yes, As reported, I did that. When I checked the string sent by the
    call to the header function, it printed precisely "Content-Type: video/
    mpg".

    Why would it not recognize video/mpg?

    Cheers,

    Ted
    Ted Byers, Nov 26, 2009
    #6
  7. Ted Byers

    Ted Byers Guest

    On Nov 25, 6:46 pm, "Mumia W." <paduille.4061.mumia.w
    > wrote:
    > On 11/25/2009 03:33 PM, Ted Byers wrote:
    >
    > > I am using CGI, and have been able to do most of the things I need to
    > > do, until now.

    >
    > > With package CGI (and my question is specific to what is in that
    > > package and what I might have to do beyond what it intrinsicly
    > > supports), the documentation beginning with the title "CREATING  A
    > > STANDARD HTTP HEADER" gives, among other examples, the following
    > > example:

    >
    > > print header('image/gif');

    >
    > > From this I believed that I could write something like:

    >
    > >   print $query->header('video/$format');

    >
    > Try this instead:
    >
    > print $query->header(
    >         '-content-type' => 'text/html',
    >         '-content-disposition' => 'attachment; filename=result.avi',
    >         )
    >
    > See RFC 2616.
    >


    That produces the following server error:
    [Wed Nov 25 20:24:54 2009] [error] [client 127.0.0.1] Bad name after
    disposition' at C:/ApacheAndPerl/Apache2/cgi-bin/video.server.cgi line
    45.

    Might there be a typo in the disposition line?

    Ted

    > >   open(FIN,"<","$fname");
    > >   binmode(FIN);
    > >   binmode STDOUT;
    > >   my $fcontent;
    > >   read FIN, $fcontent, $flength;
    > >   print $fcontent;
    > > [...]

    >
    >
    Ted Byers, Nov 26, 2009
    #7
  8. Ted Byers

    Ted Byers Guest

    On Nov 25, 11:43 pm, Sherm Pendley <> wrote:
    > Ted Byers <> writes:
    > > Why would it not recognize video/mpg?

    >
    > Perhaps because it's supposed to be video/mpeg. Similarly, the MIME type
    > for a .avi file is video/x-msvideo.
    >
    > sherm--


    OK, so the root of my problem seems to that the video subtype is
    wrong.

    Changing video/mpg to video/mpeg fixes the problem for mpeg, but the
    problem remains for the asf and avi files. For asf files I tried
    video/asf and video/x-ms-asf, and for avi files I tried video/avi,
    video/msvideo and video/x-msvideo' all to no avail. I found each of
    the variants I tried on the web (such as www.webmaster-toolkit.com/mime-types.shtml
    and pcs.cruz-network.net/faq.php, to list only two of those pages I
    found).

    NB: My line that sets content type has been changed to:

    print $query->header('-content-type' => "video/$format",'-content-
    length' => $flength);

    I figured I might as well set the content length header at the same
    time.

    I did notice that once I changed video/mpg to video/mpeg, the client
    added the mpg extension to the script name and the media player opened
    immediately. With the other formats, it left the script name
    unchanged. Now, if I tell it to display the content using the media
    player, the content is displayed.

    Any ideas on what to use for the MIME subtype for AVI and ASF files
    that would be recognized by clients like firefox and MS IE?

    Thanks

    Ted
    Ted Byers, Nov 26, 2009
    #8
  9. Ted Byers

    Ted Byers Guest

    On Nov 26, 10:18 am, "Mumia W." <paduille.4061.mumia.w
    > wrote:
    > On 11/25/2009 07:27 PM, Ted Byers wrote:
    >
    >
    >
    > > On Nov 25, 6:46 pm, "Mumia W." <paduille.4061.mumia.w
    > > > wrote:
    > >> Try this instead:

    >
    > >> print $query->header(
    > >>         '-content-type' => 'text/html',
    > >>         '-content-disposition' => 'attachment; filename=result.avi',
    > >>         )

    >
    > >> See RFC 2616.

    >
    > > That produces the following server error:
    > > [Wed Nov 25 20:24:54 2009] [error] [client 127.0.0.1] Bad name after
    > > disposition' at C:/ApacheAndPerl/Apache2/cgi-bin/video.server.cgi line
    > > 45.

    >
    > > Might there be a typo in the disposition line?

    >
    > > Ted

    >
    > It should work. Try this test program:
    >
    > #!/usr/bin/perl
    > use strict;
    > use warnings;
    > use CGI qw/-no_xhtml :standard/;
    >
    > my $file = 'content.avi';
    >
    > print header(
    >      '-content_type' => 'video/avi',
    >      '-content_disposition' => 'attachment; filename=result.avi',
    >      );
    >
    > open my $fh, '<', $file or die("Failure: $!");
    > fpassthrough($fh);
    > close $fh;
    >
    > sub fpassthrough {
    >      my ($handle) = @_;
    >      local $/ = \1000;
    >      local $_;
    >      while (<$handle>) {
    >          print;
    >      }
    >
    > }
    >
    >


    Yup. I had to edit a bit so it would work on Windows (path to perl
    and use binmode on the file handle), but that worked. So I have to
    compare that with what I had yesterday to discover why mine didn't
    work.

    Do you know if that works for mpeg and asf files? What would you set
    the content type to? And I notice you don't set content length with
    this.

    Thanks.

    Ted
    Ted Byers, Nov 26, 2009
    #9
  10. Ted Byers

    Ted Byers Guest

    On Nov 26, 11:49 am, Sherm Pendley <> wrote:
    > Ted Byers <> writes:
    > > problem remains for the asf and avi files.  For asf files I tried
    > > video/asf and video/x-ms-asf, and for avi files I tried video/avi,
    > > video/msvideo and video/x-msvideo' all to no avail.

    >
    > There's no need to guess - just look at Apache's mime.types file to see
    > what MIME type it maps to a given filename extension. The relevant lines
    > from my local copy of that file are:
    >
    >     video/x-msvideo      avi
    >     video/x-ms-asf        asf asx
    >
    > sherm--


    OK, On mine, there is a line like your's for avi files, but there is
    nothing in the mime.types file on my system for asf files. This is
    puzzling since everything works well if I just redirect to an asf file
    in htdocs instead of setting the content type and then writing the
    content of the file to standard out in binary mode.

    But that doesn't cover what is happening on the client side. Even
    though the server may not send video/avi as the MIME type for an avi
    file, both Firefox and MS IE recognize video/avi. I know this because
    Mumia's latest example worked fine even though he set the content type
    to video/avi. That give's me an idea, from what you said and what
    Mumia's example does, that I will have to test after lunch.

    Cheers,

    Ted
    Ted Byers, Nov 26, 2009
    #10
  11. On Wed, 25 Nov 2009 23:57:06 +0100, Ted Byers <>
    wrote:

    > In my testing, the client browser thinks
    > the video file content has the cgi script as the file name.


    Of course, the URL is http://host/cgi-bin/script.pl or something like
    that. The browser thinks the file is called "script.pl".

    An easy way to change this is to use the URL
    http://host/cgi-bin/script.pl/file.avi (or whatever file name you want to
    have). Apache will know to actually call your script.pl, and not try to
    access script.pl as a directory.
    Jochen Lehmeier, Nov 26, 2009
    #11
  12. Ted Byers

    Ted Byers Guest

    On Nov 26, 12:40 pm, Ted Byers <> wrote:
    > On Nov 26, 11:49 am, Sherm Pendley <> wrote:
    >
    > > Ted Byers <> writes:
    > > > problem remains for the asf and avi files.  For asf files I tried
    > > > video/asf and video/x-ms-asf, and for avi files I tried video/avi,
    > > > video/msvideo and video/x-msvideo' all to no avail.

    >
    > > There's no need to guess - just look at Apache's mime.types file to see
    > > what MIME type it maps to a given filename extension. The relevant lines
    > > from my local copy of that file are:

    >
    > >     video/x-msvideo      avi
    > >     video/x-ms-asf        asf asx

    >
    > > sherm--

    >
    > OK, On mine, there is a line like your's for avi files, but there is
    > nothing in the mime.types file on my system for asf files.  This is
    > puzzling since everything works well if I just redirect to an asf file
    > in htdocs instead of setting the content type and then writing the
    > content of the file to standard out in binary mode.
    >
    > But that doesn't cover what is happening on the client side.  Even
    > though the server may not send video/avi as the MIME type for an avi
    > file, both Firefox and MS IE recognize video/avi.  I know this because
    > Mumia's latest example worked fine even though he set the content type
    > to video/avi.  That give's me an idea, from what you said and what
    > Mumia's example does, that I will have to test after lunch.
    >
    > Cheers,
    >
    > Ted


    OK, final result: the idea that I got just before lunch, from
    combining what Mumia provided and what sherm said paid off, and now
    everything works as expected. And I even improved performance by
    modifying Mumia's example to use sysread and syswrite.

    There are still aspects of the behaviour I saw previously that I don't
    understand. For example, once I used video/mpeg as the content type
    (not using Mumia's example) the client believed the file name was
    'my.cgi.script.cgi.mpg' and knew enough to try to open it using
    Windows Media Player, but with all the other content types, the same
    client believed the file name was'my.cgi.script.cgi'. Why the
    difference? Anyway, although I am not happy with this gap in my
    understanding, I can proceed to the next step.

    When I applied Mumia's example to my own code, in each case, whether I
    was sending an asf file, an avi file or an mpg file, in every case the
    client believed the file name was what was actually the correct file
    name for the clip being sent, and as a result, in each case the file
    was displayed correctly using Windows Media Player.

    Thanks all.

    Ted
    Ted Byers, Nov 26, 2009
    #12
  13. Ted Byers

    Ted Byers Guest

    On Nov 26, 2:06 pm, "Jochen Lehmeier" <>
    wrote:
    > On Wed, 25 Nov 2009 23:57:06 +0100, Ted Byers <>  
    > wrote:
    >
    > > In my testing, the client browser thinks
    > > the video file content has the cgi script as the file name.

    >
    > Of course, the URL ishttp://host/cgi-bin/script.plor something like  
    > that. The browser thinks the file is called "script.pl".
    >


    I can understand that. What isn't clear is why either the client or
    the server is changing that to script.pl.mpg when an mpeg is requested
    and not when the files with other video formats are requested (even
    avi and there is a line in mime.types saying what the mime type is for
    avi files).

    > An easy way to change this is to use the URL  http://host/cgi-bin/script.pl/file.avi(or whatever file name you want to  
    > have). Apache will know to actually call your script.pl, and not try to  
    > access script.pl as a directory.


    Oh. OK.

    The only issue with that solution is that I won't know until run time
    what format the requested clip is actually in.

    Thanks

    Ted
    Ted Byers, Nov 26, 2009
    #13
  14. On 2009-11-26 19:13, Ted Byers <> wrote:
    [generating videos (or actually any content-type) from CGI]
    > There are still aspects of the behaviour I saw previously that I don't
    > understand. For example, once I used video/mpeg as the content type
    > (not using Mumia's example) the client believed the file name was
    > 'my.cgi.script.cgi.mpg' and knew enough to try to open it using
    > Windows Media Player, but with all the other content types, the same
    > client believed the file name was'my.cgi.script.cgi'. Why the
    > difference?


    As Ben already noted, the "file name" in an URI is supposed to be
    completely immaterial to the browser. Whether the URL ends in
    "video.cgi" or "video.mpg" or "video.html" should not make any
    difference. The only thing that is important for the browser is the
    content-type. When the browser recognizes the content-type, it knows how
    to handle the file, e.g., to call ms media player. It also knows (on
    Windows) which extension a file of this type is supposed to have, so it
    can add a proper extension.

    (Unfortunately, Firefox subscribes to the "the truth is much too
    complicated for the average user, so we lie to them and confuse the heck
    out of them" school of thought - so you can't believe anything it
    displays in dialog boxes. But at least it does the right thing
    internally, unlike IE, which both ignores the content type whenever it
    feels like it and lies to the user)

    hp
    Peter J. Holzer, Nov 27, 2009
    #14
  15. On 2009-11-25 22:11, Uri Guttman <> wrote:
    >>>>>> "TB" == Ted Byers <> writes:

    > TB> my $fcontent;
    > TB> read FIN, $fcontent, $flength;
    >
    > where is $flength set? i assume you would do a -s to get the file size
    >
    > TB> print $fcontent;
    >
    > if you want more speed, use sysread and syswrite.


    sysread/syswrite probably aren't much faster than read/print. The latter
    have a bit more buffer handling overhead but that is almost certainly
    negligible when you read data from a disk and send it over the network.

    However, if the files are large (and videos can be quite large), you can
    save quite a lot of time by reading the file in smallish chunks (a few
    kB to a few MB) and send each chunk immediately. If you read the whole
    file into memory first and then send it to the client the times for
    reading from disk and sending over the net add up. Otherwise they
    overlap resulting in a shorter total time.

    hp
    Peter J. Holzer, Nov 27, 2009
    #15
  16. Ted Byers

    Uri Guttman Guest

    >>>>> "PJH" == Peter J Holzer <> writes:

    PJH> On 2009-11-25 22:11, Uri Guttman <> wrote:
    >>>>>>> "TB" == Ted Byers <> writes:

    TB> my $fcontent;
    TB> read FIN, $fcontent, $flength;
    >>
    >> where is $flength set? i assume you would do a -s to get the file size
    >>

    TB> print $fcontent;
    >>
    >> if you want more speed, use sysread and syswrite.


    PJH> sysread/syswrite probably aren't much faster than read/print. The latter
    PJH> have a bit more buffer handling overhead but that is almost certainly
    PJH> negligible when you read data from a disk and send it over the network.

    they both avoid stdio (or perl's version) so they are faster. how much
    depends on the amount of i/o and how many calls are made. this is why
    file::slurp uses sysread/write. see its benchmarks to see the difference
    from read/print.

    PJH> However, if the files are large (and videos can be quite large),
    PJH> you can save quite a lot of time by reading the file in smallish
    PJH> chunks (a few kB to a few MB) and send each chunk immediately. If
    PJH> you read the whole file into memory first and then send it to the
    PJH> client the times for reading from disk and sending over the net
    PJH> add up. Otherwise they overlap resulting in a shorter total time.

    for some definition of large and small! :)

    uri

    --
    Uri Guttman ------ -------- http://www.sysarch.com --
    ----- Perl Code Review , Architecture, Development, Training, Support ------
    --------- Gourmet Hot Cocoa Mix ---- http://bestfriendscocoa.com ---------
    Uri Guttman, Nov 27, 2009
    #16
  17. perlio vs. sysread speed (was: Quick CGI question (specific to theCGI package))

    On 2009-11-27 18:49, Uri Guttman <> wrote:
    >>>>>> "PJH" == Peter J Holzer <> writes:

    >
    > PJH> On 2009-11-25 22:11, Uri Guttman <> wrote:
    > >> if you want more speed, use sysread and syswrite.

    >
    > PJH> sysread/syswrite probably aren't much faster than read/print. The latter
    > PJH> have a bit more buffer handling overhead but that is almost certainly
    > PJH> negligible when you read data from a disk and send it over the network.
    >
    > they both avoid stdio (or perl's version) so they are faster. how much
    > depends on the amount of i/o and how many calls are made. this is why
    > file::slurp uses sysread/write. see its benchmarks to see the difference
    > from read/print.


    Your benchmark was for a 300 MHz SPARC. CPU speed has improved more than
    disk speed since then.

    So I grabbed the server with the fastest disks I had access to (disk
    array of SSDs), created a file with 400 million lines of 80 characters
    (plus newline) each and ran some benchmarks:

    method time speed (MB/s)
    ----------------------------------------------
    perlio $/ = "\n" 2:35.12 209
    perlio $/ = \4096 1:35.36 340
    perlio $/ = \1048576 1:35.25 340
    sysread bs = 4096 1:35.28 340
    sysread bs = 1048576 1:35.18 340

    The times are the median of three runs. Times between the runs differed
    by about 1 second, so the difference between reading line by line and
    block by block is significant, but the difference between perlio and
    sysread or between different blocksizes isn't.

    I was a bit surprised that reading line by line was so much slower than
    blockwise reading. Was it because of the higher loop overhead (81 bytes
    read per loop instead of 4096 means 50 times more overhead) or because
    splitting a block into lines is so expensive?

    So I did another run of benchmarks with different block sizes:

    method block user system cpu total
    read_file_by_perlio_block 4096 0.64s 26.87s 31% 1:27.91
    read_file_by_perlio_block 2048 1.48s 28.65s 34% 1:28.56
    read_file_by_perlio_block 1024 5.14s 29.03s 37% 1:30.59
    read_file_by_perlio_block 512 11.98s 31.33s 47% 1:31.22
    read_file_by_perlio_block 256 26.84s 33.13s 61% 1:36.85
    read_file_by_perlio_block 128 43.53s 29.05s 71% 1:41.66
    read_file_by_perlio_block 64 77.26s 28.16s 88% 1:59.70
    read_file_by_line 104.68s 28.01s 93% 2:22.34

    (the times are a bit lower now because here the system was idle while it
    had a (relatively constant) load during the first batch)

    As expected elapsed time as well as CPU time increases with shrinking
    block size. However, even at 64 bytes, reading in blocks is still 20%
    faster than reading in lines, even though the loop is now executed 27%
    more often.

    Conclusions:

    * The difference between sysread and blockwise <> isn't even measurable.

    * Above 512 Bytes the block size matters very little (and above 4k, not
    at all).

    * Reading line by line is significantly slower than reading by blocks.



    > PJH> However, if the files are large (and videos can be quite large),
    > PJH> you can save quite a lot of time by reading the file in smallish
    > PJH> chunks (a few kB to a few MB) and send each chunk immediately. If
    > PJH> you read the whole file into memory first and then send it to the
    > PJH> client the times for reading from disk and sending over the net
    > PJH> add up. Otherwise they overlap resulting in a shorter total time.
    >
    > for some definition of large and small! :)


    Let's use a specific example. I have several videos on my disk. The
    largest of them is 542 MB.

    Let's assume I have this file on the aforementioned SSD array and want to
    send it over a gbit network connection. I can read the whole file in
    542MB / 340MB/s == 1.6s. I can send it over the network in
    542MB / 120MB/s == 4.5 seconds. If I first read it completely into memory
    and then send it over the network, the total transfer time is
    1.6s + 4.5s == 6.1s. If I read the file in 4kB blocks or even line by
    line (not that reading a video line by line makes much sense) I can
    still read it faster than it can be sent over the network, but since I
    start sending only milliseconds after I start reading, the total
    transfer time now is 4.5 seconds, or 35% faster.

    hp
    Peter J. Holzer, Nov 28, 2009
    #17
  18. Re: perlio vs. sysread speed (was: Quick CGI question (specific to the CGI package))

    On 2009-11-28, Peter J. Holzer <> wrote:
    > * Reading line by line is significantly slower than reading by blocks.


    Remember that when reading line-by-line (with 80char line), you
    actually read 80 times char-by-char.

    Yours,
    Ilya
    Ilya Zakharevich, Nov 29, 2009
    #18
  19. Re: perlio vs. sysread speed (was: Quick CGI question (specific to the CGI package))

    On 2009-11-29, Ben Morrow <> wrote:
    > Quoth Ilya Zakharevich <>:
    >> On 2009-11-28, Peter J. Holzer <> wrote:
    >> > * Reading line by line is significantly slower than reading by blocks.

    >>
    >> Remember that when reading line-by-line (with 80char line), you
    >> actually read 80 times char-by-char.

    >
    > Not under normal circumstances. When perl is using buffered IO, it reads
    > a bufferful and then goes grovelling through it for line endings.


    But "grovelling" happens char-by-char [*]; then one must re-seek() to the
    position in question. Inspect how

    perl -wle "$in = <STDIN>; print qq({$in}); system $^X, @ARGV" -- -wle "print qq([$_]) while <STDIN>"

    behaves when reading from a file and from a pipe...

    Yours,
    Ilya

    [*] Last time I checked, every PerlIO operation would go a dozen
    levels deep in subroutine calls - even when a simple macro
    count--, c = *buf++ if count > 0
    would suffice. PerlIO was written without any regard to
    maintainability and efficiency...
    Ilya Zakharevich, Nov 29, 2009
    #19
  20. Re: perlio vs. sysread speed (was: Quick CGI question (specific to the CGI package))

    On 2009-11-29, Ben Morrow <> wrote:
    >> >> Remember that when reading line-by-line (with 80char line), you
    >> >> actually read 80 times char-by-char.


    >> > Not under normal circumstances. When perl is using buffered IO, it reads
    >> > a bufferful and then goes grovelling through it for line endings.


    >> But "grovelling" happens char-by-char [*]; then one must re-seek() to the
    >> position in question.


    > If I run
    >
    > ~% perl -E'say for 1..1000' >foo
    > ~% ktrace perl -pe1 foo >/dev/null
    >
    > then the only syscalls I see for fd 3 are


    First, I have no idea what `say' would do. But, judging by the name,
    it probably would not do anything with line-orented read?

    >> Inspect how
    >>
    >> perl -wle "$in = <STDIN>; print qq({$in}); system $^X, @ARGV" -- -wle
    >> "print qq([$_]) while <STDIN>"
    >>
    >> behaves when reading from a file and from a pipe...

    >
    > ktrace says (AFAICT) that perl does a single lseek to where perl thinks
    > the file pointer should be just before calling fork(2).


    This is even better than how it was before PerlIO was introduced!

    Compare this with how it was quite recently: IIRC, about 5-7 years
    after PerlIO was introduced, when I reported a spurious seek() per
    character read (!), everybody behaved as if it was a surprise to them...

    Thanks for clarifications,
    Ilya
    Ilya Zakharevich, Nov 30, 2009
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. JKop
    Replies:
    11
    Views:
    845
  2. =?Utf-8?B?SmF2?=

    Is ViwState Page-Specific or UserControl-Specific

    =?Utf-8?B?SmF2?=, Aug 16, 2006, in forum: ASP .Net
    Replies:
    2
    Views:
    510
    =?Utf-8?B?SmF2?=
    Aug 16, 2006
  3. mazdotnet
    Replies:
    2
    Views:
    378
    Alexey Smirnov
    Oct 2, 2009
  4. Kurt M. Dresner

    ruby-specific CGI question (I think)

    Kurt M. Dresner, Jul 15, 2003, in forum: Ruby
    Replies:
    10
    Views:
    264
    Rasputin
    Jul 15, 2003
  5. William FERRERES
    Replies:
    7
    Views:
    198
    William FERRERES
    Jul 9, 2007
Loading...

Share This Page