Noob File IO question

Discussion in 'C Programming' started by sore eyes, Apr 4, 2007.

  1. sore eyes

    sore eyes Guest

    Hi
    I just downloaded the free Watcom compiler and am having a little
    trouble with File IO http://www.openwatcom.org/index.php/Download


    I downloaded the following example, commented out the Command line
    arguments so that I could debug more easily. The example is a simple
    file copy. and it works. but I would like to customize this to
    automate some redundant changes in some large files.

    Here's the problem: When I debug this and watch the input buffer
    buf[], I dont see a stream of characters instead I see a sequence
    of integers. I suppose these are probably the chararters but every
    other integer in the array is a 0. I am guessing the file uses 16 bit
    characters and my program thinks they are 8 bit. I tried defining
    UNICODE = 1 in my project setting but that didn't work.

    My program will have to be able recognize sequences of characters
    read from the file. Can someone tell me what I should do to
    recognize the characters in the buffer?

    ------------------------------------------------------------------------------------
    /
    * stdc-file-copy.c - copy one file to a new location, possibly under a
    * different name.
    */

    #include <stdio.h> /* standard input/output routines.
    */

    #define MAX_LINE_LEN 1000 /* maximal line length supported.
    */

    /*
    * function: main. copy the given source file to the given target
    file.
    * input: path to source file and path to target file.
    * output: target file is being created with identical contents to
    * source file.
    */
    void
    main(int argc, char* argv[])
    {
    char* file_path_from; /* path to source file. */
    char* file_path_to; /* path to target file. */
    FILE* f_from; /* stream of source file. */
    FILE* f_to; /* stream of target file. */
    char buf[MAX_LINE_LEN+1]; /* input buffer. */

    /* read command line arguments */
    /*
    if (argc != 3 || !argv[1] || !argv[2]) {
    fprintf(stderr, "Usage: %s <source file path> <target file
    path>\n",
    argv[0]);
    exit(1);
    }
    file_path_from = argv[1];
    file_path_to = argv[2];
    */
    file_path_from = "newcode.html";
    file_path_to = "filecopy.out";

    /* open the source and the target files. */
    f_from = fopen(file_path_from, "r");
    if (!f_from) {
    fprintf(stderr, "Cannot open source file: ");
    perror("");
    exit(1);
    }
    f_to = fopen(file_path_to, "w+");
    if (!f_from) {
    fprintf(stderr, "Cannot open target file: ");
    perror("");
    exit(1);
    }

    /* copy source file to target file, line by line. */
    while (fgets(buf, MAX_LINE_LEN+1, f_from)) {
    if (fputs(buf, f_to) == EOF) { /* error writing data */
    fprintf(stderr, "Error writing to target file: ");
    perror("");
    exit(1);
    }
    }
    if (!feof(f_from)) { /* fgets failed _not_ due to encountering EOF
    */
    fprintf(stderr, "Error reading from source file: ");
    perror("");
    exit(1);
    }

    /* close source and target file streams. */
    if (fclose(f_from) == EOF) {
    fprintf(stderr, "Error when closing source file: ");
    perror("");
    }
    if (fclose(f_to) == EOF) {
    fprintf(stderr, "Error when closing target file: ");
    perror("");
    }
    }
     
    sore eyes, Apr 4, 2007
    #1
    1. Advertising

  2. sore eyes

    santosh Guest

    Re: Noob File IO question

    sore eyes wrote:
    > Hi
    > I just downloaded the free Watcom compiler and am having a little
    > trouble with File IO http://www.openwatcom.org/index.php/Download
    >
    > I downloaded the following example, commented out the Command line
    > arguments so that I could debug more easily. The example is a simple
    > file copy. and it works. but I would like to customize this to
    > automate some redundant changes in some large files.
    >
    > Here's the problem: When I debug this and watch the input buffer
    > buf[], I dont see a stream of characters instead I see a sequence
    > of integers. I suppose these are probably the chararters but every
    > other integer in the array is a 0. I am guessing the file uses 16 bit
    > characters and my program thinks they are 8 bit. I tried defining
    > UNICODE = 1 in my project setting but that didn't work.
    >
    > My program will have to be able recognize sequences of characters
    > read from the file. Can someone tell me what I should do to
    > recognize the characters in the buffer?
    >
    > ------------------------------------------------------------------------------------
    > /
    > * stdc-file-copy.c - copy one file to a new location, possibly under a
    > * different name.
    > */
    >
    > #include <stdio.h> /* standard input/output routines.*/


    You'll also need stdlib.h for exit.

    > #define MAX_LINE_LEN 1000 /* maximal line length supported. */


    If you did a character by character copying this wouldn't be a
    restriction.

    > /*
    > * function: main. copy the given source file to the given target file.
    > * input: path to source file and path to target file.
    > * output: target file is being created with identical contents to
    > * source file.
    > */
    > void
    > main(int argc, char* argv[])


    Return type of main should be an int.

    > {
    > char* file_path_from; /* path to source file. */
    > char* file_path_to; /* path to target file. */
    > FILE* f_from; /* stream of source file. */
    > FILE* f_to; /* stream of target file. */
    > char buf[MAX_LINE_LEN+1]; /* input buffer. */
    >
    > /* read command line arguments */
    > /*
    > if (argc != 3 || !argv[1] || !argv[2]) {


    This is faulty test. Checking argc alone is sufficient. The concerned
    strings, i.e. argv[1] and argv[2] should be checked seperately.

    > fprintf(stderr, "Usage: %s <source file path> <target file
    > path>\n",
    > argv[0]);
    > exit(1);


    Anything other than 0, EXIT_SUCCESS and EXIT_FAILURE are not fully
    portable return codes. stdlib.h declares exit as well as the two
    EXIT_xx macros.

    > }
    > file_path_from = argv[1];
    > file_path_to = argv[2];
    > */
    > file_path_from = "newcode.html";
    > file_path_to = "filecopy.out";
    >
    > /* open the source and the target files. */
    > f_from = fopen(file_path_from, "r");
    > if (!f_from) {
    > fprintf(stderr, "Cannot open source file: ");
    > perror("");


    fopen is guaranteed by the Standard to set errno to any sensible value
    after failure. Also you're not including errno.h.

    > exit(1);
    > }
    > f_to = fopen(file_path_to, "w+");
    > if (!f_from) {
    > fprintf(stderr, "Cannot open target file: ");
    > perror("");
    > exit(1);
    > }
    >
    > /* copy source file to target file, line by line. */
    > while (fgets(buf, MAX_LINE_LEN+1, f_from)) {
    > if (fputs(buf, f_to) == EOF) { /* error writing data */
    > fprintf(stderr, "Error writing to target file: ");
    > perror("");


    Neither is fputs guaranteed by the Standard to set errno upon error.

    > exit(1);
    > }
    > }
    > if (!feof(f_from)) { /* fgets failed _not_ due to encountering EOF
    > */
    > fprintf(stderr, "Error reading from source file: ");
    > perror("");


    You should also call perror immediatly after the failing function.
    Otherwise interveaning functions like fprintf here may themselves
    alter errno and you might get spurious messages.

    > exit(1);
    > }
    >
    > /* close source and target file streams. */
    > if (fclose(f_from) == EOF) {
    > fprintf(stderr, "Error when closing source file: ");
    > perror("");
    > }
    > if (fclose(f_to) == EOF) {
    > fprintf(stderr, "Error when closing target file: ");
    > perror("");
    > }
    > }
     
    santosh, Apr 4, 2007
    #2
    1. Advertising

  3. sore eyes

    santosh Guest

    Re: Noob File IO question

    sore eyes wrote:

    <snip>

    > /
    > * stdc-file-copy.c - copy one file to a new location, possibly under a
    > * different name.
    > */
    >
    > #include <stdio.h> /* standard input/output routines. */
    >
    > #define MAX_LINE_LEN 1000 /* maximal line length supported. */
    >
    > /*
    > * function: main. copy the given source file to the given target file.
    > * input: path to source file and path to target file.
    > * output: target file is being created with identical contents to
    > * source file.
    > */
    > void
    > main(int argc, char* argv[])
    > {


    <snip code>

    > /* open the source and the target files. */
    > f_from = fopen(file_path_from, "r");
    > if (!f_from) {
    > fprintf(stderr, "Cannot open source file: ");
    > perror("");
    > exit(1);
    > }
    > f_to = fopen(file_path_to, "w+");
    > if (!f_from) {


    You should check f_to here.

    <snip rest>
     
    santosh, Apr 4, 2007
    #3
  4. sore eyes

    santosh Guest

    Re: Noob File IO question

    santosh wrote:
    > sore eyes wrote:




    > > file_path_from = "newcode.html";
    > > file_path_to = "filecopy.out";
    > >
    > > /* open the source and the target files. */
    > > f_from = fopen(file_path_from, "r");
    > > if (!f_from) {
    > > fprintf(stderr, "Cannot open source file: ");
    > > perror("");

    >
    > fopen is guaranteed by the Standard to set errno to any sensible value
    > after failure. Also you're not including errno.h.


    I meant to write:

    fopen is *not* guaranteed by the Standard to set errno to any sensible
    value
    after failure. Also you're not including errno.h.

    <snip>
     
    santosh, Apr 4, 2007
    #4
  5. sore eyes

    sore eyes Guest

    Re: Noob File IO question

    On 4 Apr 2007 03:33:34 -0700, "santosh" <> wrote:

    As I mentioned in my first message, this is a downloaded example that
    does indeed perform the intended function of copying a file. While I
    do appreciate you pointing out all the flaws in the code, I wish you
    would have attempted to address the problem that motivated me to
    post the message.

    As it is copying, I can watch the input buffer during debugging and
    instead of seeing

    buf[0]= 'r'
    buf[1]= 'a
    buf[2]='n'
    buf[3]='d'
    buf[4]='o'
    buf[5]=m

    I see a stream that looks typically like
    buf[0]= 0
    buf[1]= 71
    buf[2] = 0
    buf[3] = 76
    buf[4] = 0
    buf[5] = 79
    buf[6] = 0
    buf[7] = 84

    As I mentioned before, I suspect that the file I opened has 16 bit
    characters but the compiler/debugger is assuming 8 bit characters.
    Do you agree that this is what is probably happening?

    The intended target is an html file and I want to be able to replace
    some bad tags during a copy so the ablity recognize and manipulate a
    sequence of characters is important. As I mentioned before, I tried
    setting UNICODE=1 in my project, but that didn't appear to affect
    anything. Anyone know what I can do about this??
     
    sore eyes, Apr 4, 2007
    #5
  6. sore eyes

    Ian Malone Guest

    Re: Noob File IO question

    sore eyes wrote:
    > On 4 Apr 2007 03:33:34 -0700, "santosh" <> wrote:
    >


    >
    > As it is copying, I can watch the input buffer during debugging and
    > instead of seeing
    >
    > buf[0]= 'r'
    > buf[1]= 'a
    > buf[2]='n'
    > buf[3]='d'
    > buf[4]='o'
    > buf[5]=m
    >
    > I see a stream that looks typically like
    > buf[0]= 0
    > buf[1]= 71
    > buf[2] = 0
    > buf[3] = 76
    > buf[4] = 0
    > buf[5] = 79
    > buf[6] = 0
    > buf[7] = 84
    >
    > As I mentioned before, I suspect that the file I opened has 16 bit
    > characters but the compiler/debugger is assuming 8 bit characters.
    > Do you agree that this is what is probably happening?
    >


    Okay, that does look like a 16 bit character encoding,
    though I haven't checked that the numbers translate to
    an existing encoding.

    > The intended target is an html file and I want to be able to replace
    > some bad tags during a copy so the ablity recognize and manipulate a
    > sequence of characters is important. As I mentioned before, I tried
    > setting UNICODE=1 in my project, but that didn't appear to affect
    > anything. Anyone know what I can do about this??
    >



    UNICODE=1 isn't going to do anything by itself, C
    just reads a file into the buffer char by char.
    If the encoding doesn't match the execution character
    set then you'll have to deal with that somehow, which
    involves determining the input encoding (if this is
    html then a UTF-16 encoded variant of UCS would be a
    good guess, the html spec defines ways to work it out).
    C doesn't really deal with this[1], you may want to
    see what international character support is available
    on your platform, or just write support for the most
    common variants (UTF-8 and UTF-16 UCS, but this will
    make text processing more difficult).

    From your original message it sounds like you want
    a copy utility which does something clever on
    encountering certain files. Practically speaking
    it needs to be able to determine the file type (by
    looking at the name, the contents or being told),
    and understand enough of the format to make its
    changes (in the case of XML et al. knowing the
    spec and therefore how to work out the encoding
    would be part of this).

    [1] There is wchar, but whether it will do what you
    want depends on the platform.

    --
    imalone
     
    Ian Malone, Apr 4, 2007
    #6
  7. sore eyes

    santosh Guest

    Re: Noob File IO question

    sore eyes wrote:
    > On 4 Apr 2007 03:33:34 -0700, "santosh" <> wrote:
    >
    > As I mentioned in my first message, this is a downloaded example that
    > does indeed perform the intended function of copying a file. While I
    > do appreciate you pointing out all the flaws in the code, I wish you
    > would have attempted to address the problem that motivated me to
    > post the message.
    >
    > As it is copying, I can watch the input buffer during debugging and
    > instead of seeing
    >
    > buf[0]= 'r'
    > buf[1]= 'a
    > buf[2]='n'
    > buf[3]='d'
    > buf[4]='o'
    > buf[5]=m
    >
    > I see a stream that looks typically like
    > buf[0]= 0
    > buf[1]= 71
    > buf[2] = 0
    > buf[3] = 76
    > buf[4] = 0
    > buf[5] = 79
    > buf[6] = 0
    > buf[7] = 84
    >
    > As I mentioned before, I suspect that the file I opened has 16 bit
    > characters but the compiler/debugger is assuming 8 bit characters.
    > Do you agree that this is what is probably happening?


    There's no point in agreeing since what's happening is system specific
    and impossible to tell without further details. Did you check the
    file's
    encoding to see if it's actually 16 bit?

    > The intended target is an html file and I want to be able to replace
    > some bad tags during a copy so the ablity recognize and manipulate a
    > sequence of characters is important. As I mentioned before, I tried
    > setting UNICODE=1 in my project, but that didn't appear to affect
    > anything. Anyone know what I can do about this??


    If the file is pure text and is under your manipulation try converting
    it to UTF-8.
    Otherwise you may have to change the locale for your C program and use
    the
    wide-character functions. What exactly need to be done is very much
    dependent
    on what your implementation actually supports as well as the
    capabilities of the
    underlying system. Maybe these links will help:

    <http://evanjones.ca/unicode-in-c.html>
    <http://www.cl.cam.ac.uk/~mgk25/unicode.html>
     
    santosh, Apr 4, 2007
    #7
  8. sore eyes

    sore eyes Guest

    Re: Noob File IO question

    Thanks for the help Santosh and Ian

    I've been able to work around the problem for the time being by:
    1)copying the html file from my editor to the Windows clipboard
    2)pasting the file into notepad
    3) saving file as a txt file
    4) renaming the txt file to my orginal html file name.
    apparently that sequence translates the charactors into the 8bit
    format that my program needs. I would prefer knowing the correct way
    to handle the 16 bit characters but at least this will allow me to
    get working again on the program's logic. I did try replacing char
    with wchar but the Watcom compiler didn't recoginize that type.
    Thanks again for the assitance.
     
    sore eyes, Apr 4, 2007
    #8
  9. On Wed, 04 Apr 2007 00:45:13 -0500, sore eyes
    <are_you_kidding@target_for_Spammers.com> wrote:

    >Hi
    > I just downloaded the free Watcom compiler and am having a little
    >trouble with File IO http://www.openwatcom.org/index.php/Download
    >
    >
    >I downloaded the following example, commented out the Command line
    >arguments so that I could debug more easily. The example is a simple
    >file copy. and it works. but I would like to customize this to
    >automate some redundant changes in some large files.
    >
    > Here's the problem: When I debug this and watch the input buffer
    >buf[], I dont see a stream of characters instead I see a sequence
    >of integers. I suppose these are probably the chararters but every
    >other integer in the array is a 0. I am guessing the file uses 16 bit
    >characters and my program thinks they are 8 bit. I tried defining
    >UNICODE = 1 in my project setting but that didn't work.


    How was newcode.html built? Have you looked at it with a hex editor
    to see what it really contains?

    Have you tried to create a simple text file whose contents you know
    and test your program on that?


    Remove del for email
     
    Barry Schwarz, Apr 6, 2007
    #9
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. (Pete Cresswell)
    Replies:
    17
    Views:
    756
    Jeffrey Silverman
    Dec 15, 2004
  2. Peter A. Schott

    Sorta noob question - file vs. open?

    Peter A. Schott, Aug 23, 2005, in forum: Python
    Replies:
    8
    Views:
    268
    Peter Hansen
    Aug 24, 2005
  3. hiro
    Replies:
    8
    Views:
    494
  4. johnny
    Replies:
    2
    Views:
    266
    Robert Kern
    Sep 10, 2007
  5. Richard Mccormack

    Loading variables from a file [Noob Question]

    Richard Mccormack, Nov 28, 2010, in forum: Ruby
    Replies:
    5
    Views:
    144
    Josh Cheek
    Nov 29, 2010
Loading...

Share This Page