size_t, ssize_t and ptrdiff_t

Discussion in 'C Programming' started by James Harris, Oct 12, 2013.

  1. James Harris

    James Kuyper Guest

    I had thought that pipes were, in the relevant senses, equivalent to
    files. I can't say that I've every knowingly used either '<' or '|' to
    send input to a program that would use fseek() on it's input file. In my
    experience, programs that do that sort of thing don't do it to either
    stdin or stdout - they open the relevant file by name. What precisely is
    the relevant difference between those two methods of passing to stdin,
    in terms of what's supposed to happen when fseek() is called?
    fseek64() and ftell64() are not reserved names as far as C is concerned.
    Strictly conforming code can use such identifiers for functions with
    external linkage, without worrying about conflicting with the POSIX
    functions of the same name. Whatever options are needed to make that
    possible could not be used when building a program which actually needed
    to use the POSIX versions.
    I can't come up with any reason why it would need to.
     
    James Kuyper, Oct 15, 2013
    #41
    1. Advertisements

  2. (snip, I wrote)
    The problem occurs even in programs that don't use fseek() or ftell().

    Maybe someone was being too careful, but as well as I know it
    (some years later) it was protecting against programs that might
    use fseek() or ftell() even if they don't actually do it.
    The program I wrote also didn't use fseek() or ftell() (or the 64
    bit offset versions) but still failed at 2GB.

    Seems it was a Solaris feature.

    -- glen
     
    glen herrmannsfeldt, Oct 15, 2013
    #42
    1. Advertisements

  3. James Harris

    James Kuyper Guest

    On 10/15/2013 04:56 PM, glen herrmannsfeldt wrote:
    ....
    So, what feature did "program" possess such that

    program < file1 > file2

    would succeed, while

    cat file1 | program | cat > file2

    would fail? I find it quite mysterious that the presence of "cat" in
    that command line would make a difference, unless "cat" were
    malfunctioning, and that doesn't seem to be what you're suggesting.
     
    James Kuyper, Oct 15, 2013
    #43
  4. Under Unix, pipes fill until some limit, usually very large, is reached.
    But only if you do IO in buffered mode, that is, using the fopen,
    fclose, fputc and "as if" interface. If you use open and write, with a
    file id rather than a FILE *, you turn the buffering off.
    So depending on how cat and the shell are written, the buffering modes
    could be different. Whilst everything should still work, if the files
    are huge, somethign somewhere migth break on one but not the other.
     
    Malcolm McLean, Oct 15, 2013
    #44
  5. Sounds like an OS bug.
     
    Keith Thompson, Oct 16, 2013
    #45
  6. James Harris

    Eric Sosman Guest

    Sounds like a hazy memory.
     
    Eric Sosman, Oct 16, 2013
    #46
  7. James Harris

    Les Cargill Guest


    You can use setbuf() on FILE * to turn off buffering. It's clunky but it
    works.
     
    Les Cargill, Oct 16, 2013
    #47
  8. James Harris

    James Kuyper Guest

    Possibly - but it could also be a shell bug, since '<' and '|' are
    features of the shell rather than of the OS itself.

    But what I'm asking for is details about the bug.
     
    James Kuyper, Oct 16, 2013
    #48
  9. James Harris

    Ken Brody Guest

    On 10/15/2013 4:56 PM, glen herrmannsfeldt wrote:
    [...]
    Either the filesystem itself couldn't handle >2GB files, or check out "man
    ulimit".

    However, that wouldn't explain why you could pipe to "cat >filename" and
    have it work, since cat would have the same restrictions. Is it possible
    that the pipe version also failed at 2GB, but cat didn't give any error?
     
    Ken Brody, Oct 16, 2013
    #49
  10. The shell just reconnects stdin/stdout to the indicated place and then
    fork()s and exec()s the indicated program; it's not in existence anymore
    after that point, so it's unlikely that it could cause said program to
    crash (or not crash).

    S
     
    Stephen Sprunk, Oct 16, 2013
    #50
  11. Unix's "everything's a file" abstraction is quite leaky: it holds only
    so long as the file operations you're performing on a non-file make
    sense for that type of non-file. Most programs just do simple reads or
    writes, so you can redirect them to non-files without encountering these
    leaks, which is why the abstraction is so powerful.
    "program < foo" connects stdin to a real file, whereas "cat foo |
    program" connects "program"'s stdin to a pipe masquerading as a file.

    IIRC, if you try to fseek() on a pipe, socket, device, etc. (i.e.
    anything that isn't really a file), it is defined to be a no-op. There
    might be an error code, but it won't (directly) crash the program.
    Many Unix programs will interpret the filename "-" as stdin/stdout or
    default to using stdin/stdout if no filename is given. The logic that
    deals with the data is usually elsewhere and might assume it was dealing
    with a real file (due to the abstraction), including doing things like
    fseek().
    It's been ages since I've developed on Solaris, but the usual Unix
    practice is to put nearly everything into libc as "weak" symbols. If
    you have a function of your own called "fseek64()", that will be a
    "strong" symbol. As you might have guessed from the names, the linker
    will prefer a "strong" symbol over a "weak" one when resolving a call.
    That way, everything works as expected.

    Headers (even the Standard ones!) often include some non-standard
    functions and types by default; you must #define various things to slim
    them down (if desired). But most of the cruft gets stuffed into other
    headers, e.g. ones defined by POSIX, even if the functions themselves
    reside in libc.
    Nor can I, but I am regularly amazed at the "creativity" of other
    programmers. I'm too lazy to go read the source for Solaris's cat, if
    it's even available, so I'm hedging my bets.

    S
     
    Stephen Sprunk, Oct 16, 2013
    #51
  12. James Harris

    James Kuyper Guest

    Until I get a more detailed explanation of how it failed, I can't rule
    out the possibility that incorrect handling of the process you describe
    might be part of the problem. I know a little bit about Unix internals,
    but what I thought I knew is inconsistent with the described symptoms,
    so there's presumably something I understand incorrectly - and I still
    haven't seen an explanation that makes it clear what it is that I've
    misunderstood.
     
    James Kuyper, Oct 16, 2013
    #52
  13. (snip regarding files larger than (or equal to) 2GB.)
    This would have been Solaris 2.6 or 2.7, both SPARC and IA32.
    (We had both running, with all files on a common NFS server.)
    At that time both ufs and NFS3 supported files larger than 2GB.
    It was a feature. To avoid breaking existing programs that only
    could fseek()/ftell() with signed 32 bit values, such programs
    were only allowed to write (or, I believe, read) files smaller
    than 2GB. As the OS doesn't know in advance when a program might
    fseek() or ftell(), it seems that they didn't wait until it
    was too late. System programs, such as cat, were rewritten (or maybe
    just reompiled). I believe that they have to use fopen64()
    instead of regular fopen().

    If you search for "large file summit" and maybe also solaris, I
    believe it is well described, though maybe not this detail.

    -- glen
     
    glen herrmannsfeldt, Oct 16, 2013
    #53
  14. Calling fseek() on a non-seekable file-like thing is an error,
    and fseek() will report that error. Not doing so would be very
    bad behavior, and would make it difficult or impossible for some
    programs to operate correctly.

    ISO C only requires it to return 0 on success, and some non-zero
    value for a request that cannot be satisfied, and does not mention
    setting errno, though like any library function it's permitted to
    set errno.

    POSIX also specifies that fseek() returns 0 on success; if
    it fails, it returns -1 and sets errno to indicate the error.
    For a non-seekable device, errno will be set to ESPIPE. (As for
    any library function, the value of errno after a successful call
    is meaningless.)

    [...]
     
    Keith Thompson, Oct 16, 2013
    #54
  15. For the gory details, from the Solaris OS team no less:
    http://unix.business.utah.edu/doc/os/solaris/misc/largefiles.pdf

    In a nutshell, a 32-bit* program using open()/fopen() on a large file
    would fail with EOVERFLOW, whereas using open64()/fopen64() would
    succeed. 64-bit* programs could use either.

    (* In this case, bitness refers to the width of a long, so ILP64 or
    I32LP64 systems would count as 64-bit, but IL32LLP64 ones wouldn't.)

    S
     
    Stephen Sprunk, Oct 17, 2013
    #55
  16. James Harris

    Seebs Guest

    I once introduced a bug similar to that with pseudo, although that's
    getting far into the "well of course that doesn't work" category.

    -s
     
    Seebs, Oct 17, 2013
    #56
  17. A higher level of abstraction. It is surprising to see how good the very old
    definition of ALGOL68 (that was from before you were born) holds up.
    It even caters for multiprocessing in calculations with exact specifications
    about what part of statements are executed concurrently. We are just entering
    that stage about now.

    Groetjes Albert
     
    Albert van der Horst, Oct 25, 2013
    #57
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.