Reading data by words from a file in Linux system

Discussion in 'C Programming' started by Kuhl, Apr 4, 2009.

  1. Kuhl

    Kuhl Guest

    Hi, I am doing programming in Linux system. I need to read data from a
    file. See the code below. Such code handles data per byte. But I
    usually need to handle data by word (two bytes). If only handle bytes,
    then the code is very inefficient. For the sample code in this post,
    it is only comparing data, so it's not so serious yet. But in further
    parts, I will do a lot of mathematical calculation. Byte operation
    would be extremely inefficient. I don't know how to read data from a
    file in words. Is there any solution? Thanks.

    #include <stdio.h>
    #include <stdlib.h>
    #include <fcntl.h>
    #include <unistd.h>
    #include <sys/stat.h>

    int main(int argc, char *argv[ ])
    {
    int fd;
    int i;
    int gds_size;
    char databuf[1024];
    struct stat filebuf;

    if(stat(argv[1], &filebuf) == -1){
    printf("\nERROR: Fail to find file %s .\n", argv[1]);
    return 0;
    }

    fd = open(argv[1], O_RDONLY);
    if(fd == -1){
    printf("\nERROR: Fail to open file %s .\n", argv[1]);
    return 0;
    }

    read(fd, databuf, 10);
    if(!(databuf[0]==0&&databuf[1]==6&&databuf[2]==0&&databuf[3]
    ==2&&databuf[6]==0&&databuf[7]==28&&databuf[8]==1&&databuf[9]==2)){
    printf("\nERROR: This is not a valid GDS format.\n");
    close(fd);
    return 0;
    }

    printf("\nFurther program going on.\n");
    close(fd);
    return 0;
    }
     
    Kuhl, Apr 4, 2009
    #1
    1. Advertising

  2. Kuhl

    Guest

    On 4 Apr, 14:00, Kuhl <> wrote:

    > Hi, I am doing programming in Linux system. I need to read data from a
    > file. See the code below. Such code handles data per byte. But I
    > usually need to handle data by word (two bytes). If only handle bytes,
    > then the code is very inefficient. For the sample code in this post,
    > it is only comparing data, so it's not so serious yet. But in further
    > parts, I will do a lot of mathematical calculation. Byte operation
    > would be extremely inefficient. I don't know how to read data from a
    > file in words. Is there any solution? Thanks.


    You're asking a Unix specific question so you'd be better off asking
    in a unix specific group (eg. comp.unix.programmer).

    On the other hand reading a byte at a time may not be your problem.
    You can read larger chunks using fread(). It may be better to read
    the whole file or a large chunk of it before you do your processing.
    You can also map the file into memory but that is platform specific.


    --
    Nick Keighley
     
    , Apr 4, 2009
    #2
    1. Advertising

  3. Kuhl

    Guest

    On 4 Apr, 19:50, William Pursell <> wrote:
    > On 4 Apr, 14:00, Kuhl <> wrote:
    >
    > > Hi, I am doing programming in Linux system. I need to read data from a
    > > file. See the code below. Such code handles data per byte. But I
    > > usually need to handle data by word (two bytes). If only handle bytes,
    > > then the code is very inefficient. For the sample code in this post,
    > > it is only comparing data, so it's not so serious yet. But in further
    > > parts, I will do a lot of mathematical calculation. Byte operation
    > > would be extremely inefficient. I don't know how to read data from a
    > > file in words. Is there any solution? Thanks.

    >
    > fread and/or fgetc will do what you want.  (ie, either call
    > fread with a 2nd argument of 2, or call fgetc twice).  Let
    > the underlying library buffer the reads to get the speed
    > you want. (ie, don't call read unless the 3rd argument
    > is BUFSIZ)
    >
    > <snip>
    >
    > >  if(stat(argv[1], &filebuf) == -1){
    > >   printf("\nERROR: Fail to find file %s .\n", argv[1]);
    > >   return 0;
    > >  }

    >
    > >  fd = open(argv[1], O_RDONLY);
    > >  if(fd == -1){
    > >   printf("\nERROR: Fail to open file %s .\n", argv[1]);
    > >   return 0;
    > >  }

    >
    > No, no, no.  A thousand times, no.  Replace both of those
    > printfs with:
    > perror( argv[ 1 ]);


    why?
     
    , Apr 4, 2009
    #3
  4. In article <>,
    <> wrote:

    >> > if(stat(argv[1], &filebuf) == -1){
    >> > printf("\nERROR: Fail to find file %s .\n", argv[1]);
    >> > return 0;
    >> > }

    >>
    >> > fd = open(argv[1], O_RDONLY);
    >> > if(fd == -1){
    >> > printf("\nERROR: Fail to open file %s .\n", argv[1]);
    >> > return 0;
    >> > }


    >> No, no, no. A thousand times, no. Replace both of those
    >> printfs with:
    >> perror( argv[ 1 ]);


    >why?


    So you get a better error message. For an even better one,
    use strerror().

    -- Richard
    --
    Please remember to mention me / in tapes you leave behind.
     
    Richard Tobin, Apr 5, 2009
    #4
  5. Kuhl

    CBFalconer Guest

    Kuhl wrote:
    >
    > Hi, I am doing programming in Linux system. I need to read data
    > from a file. See the code below. Such code handles data per byte.
    > But I usually need to handle data by word (two bytes). If only
    > handle bytes, then the code is very inefficient. ...


    That is the reason for the getc macro, which is special in that it
    may evaluate operands more than once. The macro, if supplied, is
    able to access the file buffer byte by byte, without losing
    efficiency. It uses the normal file buffer.

    7.19.7.5 The getc function

    Synopsis
    [#1]
    #include <stdio.h>
    int getc(FILE *stream);

    Description

    [#2] The getc function is equivalent to fgetc, except that
    if it is implemented as a macro, it may evaluate stream more
    than once, so the argument should never be an expression
    with side effects.

    Returns

    [#3] The getc function returns the next character from the
    input stream pointed to by stream. If the stream is at end-
    of-file, the end-of-file indicator for the stream is set and
    getc returns EOF. If a read error occurs, the error
    indicator for the stream is set and getc returns EOF.

    Also see putc.

    --
    [mail]: Chuck F (cbfalconer at maineline dot net)
    [page]: <http://cbfalconer.home.att.net>
    Try the download section.
     
    CBFalconer, Apr 5, 2009
    #5
  6. CBFalconer <> writes:
    > Kuhl wrote:
    >> Hi, I am doing programming in Linux system. I need to read data
    >> from a file. See the code below. Such code handles data per byte.
    >> But I usually need to handle data by word (two bytes). If only
    >> handle bytes, then the code is very inefficient. ...

    >
    > That is the reason for the getc macro, which is special in that it
    > may evaluate operands more than once. The macro, if supplied, is
    > able to access the file buffer byte by byte, without losing
    > efficiency. It uses the normal file buffer.

    [...]

    The fgetc function can also access the file buffer byte by byte. The
    advantage of getc over fgetc is that it can avoid the overhead of a
    function call, but both can avoid performing physical I/O on each
    call.

    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    Nokia
    "We must do something. This is something. Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
     
    Keith Thompson, Apr 5, 2009
    #6
  7. Kuhl

    Bartc Guest

    "Kuhl" <> wrote in message
    news:...
    > Hi, I am doing programming in Linux system. I need to read data from a
    > file. See the code below. Such code handles data per byte. But I
    > usually need to handle data by word (two bytes). If only handle bytes,
    > then the code is very inefficient. For the sample code in this post,
    > it is only comparing data, so it's not so serious yet. But in further
    > parts, I will do a lot of mathematical calculation. Byte operation
    > would be extremely inefficient. I don't know how to read data from a
    > file in words. Is there any solution? Thanks.


    Try one of these functions:

    int readword(FILE* f) {
    return (fgetc(f)<<8) | fgetc(f);
    }

    int readword(FILE* f) {
    return fgetc(f) | (fgetc(f)<<8);
    }

    depending on which order you want the bytes. The functions assume a file
    opened in binary mode with fopen(), or use the equivalent fgetc for your
    open() function.

    To determine if these are fast enough, just read an entire, typical file
    using readword(), but do nothing else. That will tell you how much overhead
    reading bytes this way will be.

    --
    Bartc
     
    Bartc, Apr 5, 2009
    #7
  8. Kuhl

    Bartc Guest

    "Gordon Burditt" <> wrote in message
    news:p...
    > >Try one of these functions:
    >>
    >>int readword(FILE* f) {
    >>return (fgetc(f)<<8) | fgetc(f);
    >>}
    >>
    >>int readword(FILE* f) {
    >>return fgetc(f) | (fgetc(f)<<8);
    >>}
    >>
    >>depending on which order you want the bytes.

    >
    > Neither of these functions gives the bytes in a predictable order,
    > since there is no sequence point between the first call to fgetc()
    > (whichever one that is) and the second call to fgetc() (whichever
    > one that is).


    You're right; I only tested this with two different compilers; the third one
    returned the same order in both functions.

    >
    > Also, neither of these deal with EOF reasonably.
    >
    > Try something like:
    >
    > int readword(FILE *f) {
    > int c1;
    > int c2;
    > c1 = fgetc(f);
    > c2 = fgetc(f);
    > if (c1 == EOF || c2 == EOF) {
    > return EOF;
    > }
    > return c1 | (c2 << 8);
    > }
    >
    > This one still doesn't handle EOF reasonably on a machine with
    > 16-bit ints, because there is no way to distinguish EOF from reading
    > two 0xff bytes in succession. There isn't any special value I could
    > use to signal EOF since all possible combinations of values
    > could be produced by the two characters read.


    EOF checking on a per-byte basis is probably less important when reading a
    file known to have an even number of bytes, and when the words are known to
    be structured in a certain way and when the EOF point can be predicted.

    It might suffice to do an feof() check at strategic points, together with
    data-specific integrity checks, to detect corrupt files. Or at least to
    signal EOF in a way which doesn't require the caller to test for
    readword()==EOF every single time, which would be a nightmare.

    --
    Bartc
     
    Bartc, Apr 5, 2009
    #8
  9. Kuhl

    Kuhl Guest

    Hi, many thanks that there are so many good answers. Byte order is an
    issue in my case. I found that whatever 16-bits data or 32-bits data
    in the file I am handling define higher byte as less significant byte,
    while it's on the contrary in C. C defines higher bytes as more
    significant bytes. So eventually, I wrote a function to reverse the
    byte order for each piece of data. About the speed concern, I used big-
    size data buffer, while using pointer variables to access data. DRAM
    size is not a concern in my system. Thanks.
     
    Kuhl, Apr 5, 2009
    #9
  10. Kuhl

    James Kuyper Guest

    Kuhl wrote:
    > Hi, many thanks that there are so many good answers. Byte order is an
    > issue in my case. I found that whatever 16-bits data or 32-bits data
    > in the file I am handling define higher byte as less significant byte,
    > while it's on the contrary in C. C defines higher bytes as more
    > significant bytes. ...


    C does no such thing. The byte order is up to each implementation of C
    to decide, and most implementations decide to use whatever order makes
    the most sense for the target architecture. You should avoid make any
    assumptions about the byte order., at least if you want your to be
    portable.

    > ... So eventually, I wrote a function to reverse the
    > byte order for each piece of data. About the speed concern, I used big-
    > size data buffer, while using pointer variables to access data. DRAM
    > size is not a concern in my system. Thanks.
     
    James Kuyper, Apr 5, 2009
    #10
  11. Kuhl

    Guest

    William Pursell <> wrote:
    >
    > It is true that on some systems, errno is not set for
    > things like fopen, and that perror gives a less than
    > helpful message in that case, but the OP specifically
    > states that some flavor of Linux is being used, so that's
    > not an issue.


    He's also using open, not fopen, so it isn't even relevant.
    --
    Larry Jones

    I hope Mom and Dad didn't rent out my room. -- Calvin
     
    , Apr 5, 2009
    #11
  12. Kuhl

    Kuhl Guest

    Hi. If the byte order is up to each implementation of C, then is this
    order already fixed in a compiled executable file? If it is fixed in
    the compiled file, then this program is still portable. Thanks.
     
    Kuhl, Apr 6, 2009
    #12
  13. Kuhl

    James Kuyper Guest

    Kuhl wrote:
    > Hi. If the byte order is up to each implementation of C, then is this
    > order already fixed in a compiled executable file? ...


    So to speak. It's really fixed by the interaction between the CPU and
    the generated code. The executable file may contain an instruction
    telling the CPU to load a word of memory (keep in mind that a "word" can
    refer to different numbers of bytes on different machines) from RAM into
    a register, and then perform arithmetic operations on that value in the
    register. It is the CPU itself that interprets the bytes stored in RAM
    when they get loaded into the registers. In principle, you could build
    two different machines on which exactly the same generated machine code
    would result in those bytes being interpreted in the opposite order. You
    wouldn't be able to tell, just by looking at the executable, whether it
    was implementing big-endian or little-endian integers; you would also
    have to know which of the two machines it was being run on.

    A compiler could emulate a big-endian machine even though the executable
    will be running on a little-endian machine (or vice versa), by swapping
    bytes before loading them into registers, and after writing them from
    registers.

    > ... If it is fixed in
    > the compiled file, then this program is still portable. Thanks.


    I can't figure out how you reached that conclusion; but I can tell you
    it is false.

    A given C program, compiled for one platform, may produce an executable
    file that, when run on that platform, interprets ints as 4 8-bit bytes
    in bigendian order and 2's complement representation. That same
    executable, when run on a different platform, may produce an error
    message indicating that it's in the wrong format to BE an executable
    file for that platform.

    When that same C program is compiled for the second platform, it may
    produce a different message with the same basic meaning if you attempt
    to run the generated executable on the first platform. When you run it
    on the second platform, it may interpret ints at 2 16-bit bytes in
    little-endian order and 1's complement representation.

    Whether or not this difference will cause a problem depends very much
    upon how you wrote the code. Knowing how to write code so it will
    perform essentially the operations on either platform is relatively
    easy, but non-trivial, if all of the data in the program is stored
    internally. However, if it requires an input source that is in some
    sense "the same" on both platforms, the way in which you must write the
    code depends upon the sense in which it is "the same", and the issue
    gets quite complicated.

    For instance, the process of transferring the data from one system to
    the other might put either one or two 8-bit bytes in each 16-bit byte.
    It might or might not change the endianess of multi-byte objects. It
    might or might not convert the 2's complement data to 1's complement.
    You'll have to know which of these options apply, in order to write the
    code so it handles the "same" input data to produce the "same" outputs.

    This is why it is often recommended that data to be transferred between
    platforms should be stored in text format. The data may still need to be
    transformed when transported to a different platform, but the issues
    created by that kind of transformation are much easier to deal with.
     
    James Kuyper, Apr 6, 2009
    #13
  14. Kuhl

    Guest

    On 6 Apr, 03:53, Kuhl <> wrote:

    > Hi. If the byte order is up to each implementation of C, then is this
    > order already fixed in a compiled executable file? If it is fixed in
    > the compiled file, then this program is still portable. Thanks.


    this makes no sense.

    Yes, the byte order is up to each implementaion of C.
    Yes, it is fixed by the compiler. Or at least the compiler
    should agree with the platform conventions.
    No it is not portable.


    int main(void)
    {
    int i = 1;
    unsigned char *p = &i;
    printf ("lo byte:%x ho byte:%x\n", *p&0xff, *(p + 1)&0xff);
    return 0;
    }

    this gives different results of different platforms.
    If int is 16 bits (uncommon these days!) then it could
    print 00 01 or 01 00 depending on endianess.

    The order of bytes in a file will stay the same even if
    the file is moved to another platform. If the file
    is written on different platforms you may get different results.

    I say reading a word at a time is proably a bad idea. If
    you are *certain* that byte at a time i/o is your problem;
    and you have MEASURED IT. Then consider reading much bigger
    chunks and then convert them to words internally. You may need
    to run different coe on different platforms.

    int make_word (unsigned char *buffer)
    {
    #ifdef LSB_FIRST
    return *buffer & (*(buffer + 1) << 8;
    #else
    return (*buffer << 8) & *(buffer + 1);
    #endif
    }
     
    , Apr 6, 2009
    #14
  15. Kuhl

    Guest

    On 6 Apr, 11:11, wrote:
    > On 6 Apr, 03:53, Kuhl <> wrote:



    <snip>

    > I say reading a word at a time is proably a bad idea. If
    > you are *certain* that byte at a time i/o is your problem;
    > and you have MEASURED IT. Then consider reading much bigger
    > chunks and then convert them to words internally. You may need
    > to run different
    Code:
     on different platforms.
    >
    > int make_word (unsigned char *buffer)
    > {
    > #ifdef LSB_FIRST
    >     return *buffer & (*(buffer + 1) << 8;
    > #else
    >     return (*buffer << 8) & *(buffer + 1);
    > #endif[/color]
    
    damn...
    
    ok, see those ands (&) up there? They should be ors (|)
     
    , Apr 6, 2009
    #15
  16. Kuhl

    JosephKK Guest

    On Mon, 06 Apr 2009 10:09:52 GMT, James Kuyper
    <> wrote:

    >Kuhl wrote:
    >> Hi. If the byte order is up to each implementation of C, then is this
    >> order already fixed in a compiled executable file? ...

    >

    <snip>
    >
    >This is why it is often recommended that data to be transferred between
    >platforms should be stored in text format. The data may still need to be
    >transformed when transported to a different platform, but the issues
    >created by that kind of transformation are much easier to deal with.


    Or you could use Xdr.
    .
     
    JosephKK, Apr 13, 2009
    #16
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Peter Strøiman
    Replies:
    1
    Views:
    2,091
    Peter Strøiman
    Aug 23, 2005
  2. Richard Heathfield
    Replies:
    7
    Views:
    365
    Barry Schwarz
    Oct 5, 2003
  3. utab

    Words Words

    utab, Feb 16, 2006, in forum: C++
    Replies:
    6
    Views:
    428
    Daniel T.
    Feb 16, 2006
  4. BerlinBrown
    Replies:
    6
    Views:
    4,503
  5. Lasse Edsvik

    replace words with bold words

    Lasse Edsvik, Oct 5, 2003, in forum: ASP General
    Replies:
    9
    Views:
    240
Loading...

Share This Page