Perl's read() vs. sysread()

Discussion in 'Perl Misc' started by J. Romano, Feb 6, 2004.

  1. J. Romano

    J. Romano Guest

    Hi,

    I was wondering if anyone could tell me the differences between
    Perl's read() function and its sysread() function. Now, by reading
    the perldocs I know that Perl's read() function implements the
    system's fread() call and that Perl's sysread() function implements
    the system's read() call, but I really don't know what that means. I
    tried reading the man pages on fread() and read(), but that didn't
    help me much.

    Here is what else I know about read and sysread (please correct me
    if I'm wrong):

    * read belongs to a group of functions that includes read, print,
    write, seek, tell, eof, and the angle-bracket-filehandle-operator.

    * sysread belongs to a group of functions that includes sysread,
    syswrite, and sysseek.

    * The functions in these two groups should not be mixed (unless, as
    the Camel book says, I am into wizardry and/or pain). (Just an aside
    note: I once accidentally used a print statement on a socket that I
    had used sysread() on. It worked fine the first day, but froze up on
    the print statement the next day. When I finally found the error and
    changed the print statement to syswrite(), the program no longer froze
    on me. This was a classic case of "the program worked fine for me
    yesterday.")

    * The functions open, close, and binmode can be used safely with
    functions of both groups.

    That's all I know about the difference between Perl's read() and
    sysread() functions. What I would still like to know is:

    * When should I use read() over sysread() (or sysread() over
    read())?

    * What differences in program execution can I expect if I switch my
    read() statements to sysread() (or vice-versa)?

    * If I were to open a socket over the internet using IO::Socket,
    would it best to use the read() group of functions or the sysread()
    group of functions?

    * Since Perl's read() function uses the system's fread() call and
    Perl's sysread() function uses the system's read() call, what does
    that mean to me if I'm using those functions on a non-Unix system,
    like Win32 using ActiveState Perl? I would imagine that, in that
    particular circumstance, there would be no difference between Perl's
    read() and fread() functions, but the mix-up I mentioned above about
    using a print statement with a sysread() function was done on a Perl
    program running on a Windows XP machine, so something different must
    be happening under the hood even on Win32 operating systems.

    Thanks in advance for any input,

    Jean-Luc
     
    J. Romano, Feb 6, 2004
    #1
    1. Advertisements

  2. J. Romano

    Paul Lalli Guest

    This is far from a complete answer, but I know that one difference is that
    the sys* family of functions operates on data unbuffered, whereas the
    read(), print(), etc functions buffer their I/O. This is because this
    second group uses the stdio or perlio layers, whereas sys* bypass those
    layers to interface directly with the system.

    I'm sure someone else can give more details.

    Paul Lalli
     
    Paul Lalli, Feb 6, 2004
    #2
    1. Advertisements

  3. J. Romano

    Ben Morrow Guest

    And, most importantly, select.
    Personally, I'd never use read.

    The difference between the two sets is that read, print, etc. all
    buffer their IO. What this means is that when you say read(...), Perl
    actually reads rather more than you asks for, and returns the rest on
    later read calls. This means that there is less low-level access to
    the operating system, which makes things more efficient. Similarly,
    when you print something, it actually only goes into a buffer. The
    whole buffer is then printed in one go when it reaches a certain size
    (or when you print a newline if the output is line-buffered). Output
    buffering can be turned off with $|.

    The only time to use sys* is when using select. select waits for data
    to be ready on a filehandle, and sysread then lets you read what data
    is there without waiting for more: obviously, if there's a girt big
    buffer between you and the filehandle this isn't going to work.
    Buffered IO is always more efficient when you can use it.
    The only important thing is not to mix them. If you are using select
    (or IO::Select), you *must* use the sys* functions; if you are waiting
    for a response from the other end, you'd be better off using the sys*
    as otherwise you may find that you're waiting for a response to a
    request that's still sitting in your buffer (though this can be dealt
    with using $|); otherwise, you're probably best off using buffered IO
    for efficiency.
    Any system that supports ANSI C (read: any system perl builds on)
    supports fread(3). With 5.8, in fact, the buffering fread(3) does is
    re-implemented inside perl, as this gives both more flexibility and a
    measure of protection from certain OS's broken stdio libraries.

    'Most any OS will also support either read(2) or some equivalent. Most
    support read(2) directly: Win32 does, though it also has its own set
    of functions, in the classic Microsoft fashion of not doing a thing
    well once when you can do it badly five times.

    Ben
     
    Ben Morrow, Feb 6, 2004
    #3
  4. : (J. Romano) wrote:
    :> I was wondering if anyone could tell me the differences between
    :> Perl's read() function and its sysread() function.

    :The difference between the two sets is that read, print, etc. all
    :buffer their IO.

    That's certainly an important difference. There are sometimes other
    differences as well.


    :The only time to use sys* is when using select.

    That's not the *only* time. Sometimes, some of the functionality
    available via a systems fcntl() call are only available when you
    use the sys* calls.


    :Buffered IO is always more efficient when you can use it.

    Not completely correct. In fact, not at all correct if you are
    into low-level I/O wizardry. Buffered I/O -always- copies the data,
    and there are situations where that data copy just isn't fast enough
    (e.g., for streaming video.) If you want top efficiency, you have
    to use the sysread() functions... or more likely, you have to drop
    into XS and do some magic down there for proper buffer alignment.

    If you want efficiency, you want *unbuffered* reads, and lots of
    good reference manuals on hand... and you probably want DMA, and
    direct I/O, and scatter-gather, and you want filesystems that
    support real-time I/O and guaranteed bandwidth (e.g., xfs) and
    you want ....
     
    Walter Roberson, Feb 6, 2004
    #4
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.