The infamous ^Z problem

Discussion in 'C Programming' started by Eigenvector, May 23, 2007.

  1. Eigenvector

    Eigenvector Guest

    I've been surfing the FAQ and Google for about a week and haven't quite
    figured out this one.

    I have a file that changes on a periodic basis and every once and a while
    ^Zs will appear in the file for reasons I don't want to get into. I need to
    get rid of those ^Z's and need to do it via a C code as it is the only tool
    available to me that can handle the file size.

    So I cooked up some code, tried it out on one platform - and it works great,
    it doesn't work so great on another and I am trying to understand why. I
    did my best to code standard but perhaps that is where I'm failing.

    #include <stdio.h>
    int main(int argc, char *argv[])
    {
    FILE *infile, *outfile;
    int c; /*picked that up from the FAQ */
    if ( (infile = fopen(argv[1], "rb") == NULL) /*picked the binary part up
    from this google group*/
    {
    printf("Cannot open file\n");
    exit(1);
    }
    if ( (outfile = fopen("Clean_file", "w+")) == NULL)
    {
    printf("Cannot open output file\n");
    exit(1);
    }
    while ((c=fgetc(infile)) != EOF )
    {
    if(c == 0x1a) /* This is where I'm having a problem */
    /* if(c == '\0x1a') This fails with compiler error - more than one
    character defined for type char */
    {
    c='_'; /*replace bad control char with something innocuous */
    }
    fputs(c,outfile);
    }
    fclose(infile);
    fclose(outfile);
    }
    Yeah it's a pretty primitive code, but I'm more interested in getting the
    basics working before I go in and optimize the way it handles the input
    file. This compiles on xlC and HP's ANSI C compilers.

    In the first if statement dealing with the ^Z, the program doesn't detect
    the control characters in the file, in the second statement the compiler
    complains about syntax. If I set c as typecast char, it finds the control
    characters, replaces them, but then blows away the EOF character and nukes
    the file.

    I have the suspicion that its the way I'm defining the c==\0x1a that is
    leading my astray here. I can't find any good consistent documentation on
    exactly how to represent hex or octal in c code or string/character
    operations.
     
    Eigenvector, May 23, 2007
    #1
    1. Advertising

  2. In article <>,
    Eigenvector <> wrote:
    > if(c == 0x1a) /* This is where I'm having a problem */


    /* if(c == '\0x1a') This fails with compiler error - more than one
    character defined for type char */

    don't use the single quotes around the value...
    0x1a is an int not a char
    c is an int also.

    HTH

    --
    Mitch


    www.sand-hill.freeserve.co.uk/terminal_crazy
     
    Terminal Crazy, May 23, 2007
    #2
    1. Advertising

  3. Terminal Crazy <> writes:
    > In article <>,
    > Eigenvector <> wrote:
    >> if(c == 0x1a) /* This is where I'm having a problem */

    >
    > /* if(c == '\0x1a') This fails with compiler error - more than one
    > character defined for type char */
    >
    > don't use the single quotes around the value...
    > 0x1a is an int not a char
    > c is an int also.


    You're using the wrong syntax for a hexadecimal escape in a character
    literal. You want '\x1a', not '\0x1a'. 0x1a (without the quotation
    marks) will also work, but '\x1a' makes it clearer that you're dealing
    with a character.

    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
    "We must do something. This is something. Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
     
    Keith Thompson, May 23, 2007
    #3
  4. Eigenvector

    Old Wolf Guest

    On May 23, 12:34 pm, "Eigenvector" <> wrote:
    > if ( (infile = fopen(argv[1], "rb") == NULL) /*picked the binary part up
    > from this google group*/
    > {
    > printf("Cannot open file\n");
    > exit(1);
    > }
    > if ( (outfile = fopen("Clean_file", "w+")) == NULL)


    Open mode should be "w+b". You want to write it the
    same way you read it.

    > if(c == 0x1a) /* This is where I'm having a problem */
    > /* if(c == '\0x1a') This fails with compiler error - more than one
    > character defined for type char */


    There are four characters in that constant: '\0', 'x',
    '1', and 'a'. I think you mean '\x1a', although the
    uncommented code is also correct and does the same thing.

    > Yeah it's a pretty primitive code, but I'm more interested in getting the
    > basics working before I go in and optimize the way it handles the input
    > file. This compiles on xlC and HP's ANSI C compilers.


    Out of interest, how were you planning on optimising
    this? (I think you'll find that reading in a block
    at a time won't gain you anything).

    > In the first if statement dealing with the ^Z, the program doesn't detect
    > the control characters in the file, in the second statement the compiler
    > complains about syntax. If I set c as typecast char, it finds the control
    > characters, replaces them, but then blows away the EOF character and nukes
    > the file.


    It doesn't seem possible that your posted code won't
    find the 0x1a characters. There must be some other
    problem, e.g. this isn't your real code, or the
    non-binary output is munging up.

    > I can't find any good consistent documentation on exactly
    > how to represent hex or octal in c code or string/character
    > operations.


    Try the C Standard, or "The C Programming Language"
    by Kernighan & Ritchie.
     
    Old Wolf, May 23, 2007
    #4
  5. "Eigenvector" <> writes:
    > I've been surfing the FAQ and Google for about a week and haven't
    > quite figured out this one.
    >
    > I have a file that changes on a periodic basis and every once and a
    > while ^Zs will appear in the file for reasons I don't want to get
    > into. I need to get rid of those ^Z's and need to do it via a C code
    > as it is the only tool available to me that can handle the file size.
    >
    > So I cooked up some code, tried it out on one platform - and it works
    > great, it doesn't work so great on another and I am trying to
    > understand why. I did my best to code standard but perhaps that is
    > where I'm failing.
    >
    > #include <stdio.h>
    > int main(int argc, char *argv[])
    > {
    > FILE *infile, *outfile;
    > int c; /*picked that up from the FAQ */
    > if ( (infile = fopen(argv[1], "rb") == NULL) /*picked the binary
    > part up from this google group*/


    What if argv[1] doesn't exist? Check the value of argc.

    > {
    > printf("Cannot open file\n");


    Error messages are traditionally written to stderr rather than stdout.

    > exit(1);


    The only portable values for the argument to exit() are 0,
    EXIT_SUCCESS, and EXIT_FAILURE. In this case, I'd recommend using
    EXIT_FAILURE, which would also force you to add "#include <stdlib.h>"
    (which is required for the exit() function anyway).

    > }
    > if ( (outfile = fopen("Clean_file", "w+")) == NULL)


    You opened the input file in binary mode, "rb", which seems correct,
    but you opened the output file in text mode *and* update mode, even
    though you only write to it. For consistency, use "wb" (write-only,
    binary mode).

    > {
    > printf("Cannot open output file\n");
    > exit(1);
    > }
    > while ((c=fgetc(infile)) != EOF )
    > {
    > if(c == 0x1a) /* This is where I'm having a problem */
    > /* if(c == '\0x1a') This fails with compiler error - more than
    > one character defined for type char */


    0x1a should work. '\x1a' is equivalent and probably clearer.

    (A compiler *could* accept '\0x1a', but it does't mean what you think
    it means. The \0 represents a null character, and it's followed by
    characters 'x', '1', and 'a'. Multi-character character literals are
    legal, but their meaning is implementation-defined; they're hardly
    ever useful.)

    > {
    > c='_'; /*replace bad control char with something innocuous */
    > }
    > fputs(c,outfile);
    > }
    > fclose(infile);
    > fclose(outfile);


    "return 0;" or "exit(0);".

    > }
    > Yeah it's a pretty primitive code, but I'm more interested in getting
    > the basics working before I go in and optimize the way it handles the
    > input file. This compiles on xlC and HP's ANSI C compilers.


    <OT>Since you're using Unix-like systems, "man tr".</OT>

    > In the first if statement dealing with the ^Z, the program doesn't
    > detect the control characters in the file,


    I don't know why it would cause that problem. I suspect you may be
    misinterpreting the symptoms, but it's hard to tell.

    > in the second statement the
    > compiler complains about syntax. If I set c as typecast char, it
    > finds the control characters, replaces them, but then blows away the
    > EOF character and nukes the file.
    >
    > I have the suspicion that its the way I'm defining the c==\0x1a that
    > is leading my astray here. I can't find any good consistent
    > documentation on exactly how to represent hex or octal in c code or
    > string/character operations.


    Really? Any decent C reference should explain that. If nothing else,
    you can get the latest draft of the C standard at
    <http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1124.pdf>; see
    sections 6.4.4.4 and 6.4.5.

    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
    "We must do something. This is something. Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
     
    Keith Thompson, May 23, 2007
    #5
  6. Eigenvector

    Eigenvector Guest

    "Keith Thompson" <> wrote in message
    news:...
    > "Eigenvector" <> writes:
    >> I've been surfing the FAQ and Google for about a week and haven't
    >> quite figured out this one.
    >>
    >> I have a file that changes on a periodic basis and every once and a
    >> while ^Zs will appear in the file for reasons I don't want to get
    >> into. I need to get rid of those ^Z's and need to do it via a C code
    >> as it is the only tool available to me that can handle the file size.
    >>
    >> So I cooked up some code, tried it out on one platform - and it works
    >> great, it doesn't work so great on another and I am trying to
    >> understand why. I did my best to code standard but perhaps that is
    >> where I'm failing.
    >>
    >> #include <stdio.h>
    >> int main(int argc, char *argv[])
    >> {
    >> FILE *infile, *outfile;
    >> int c; /*picked that up from the FAQ */
    >> if ( (infile = fopen(argv[1], "rb") == NULL) /*picked the binary
    >> part up from this google group*/

    >
    > What if argv[1] doesn't exist? Check the value of argc.
    >
    >> {
    >> printf("Cannot open file\n");

    >
    > Error messages are traditionally written to stderr rather than stdout.
    >
    >> exit(1);

    >
    > The only portable values for the argument to exit() are 0,
    > EXIT_SUCCESS, and EXIT_FAILURE. In this case, I'd recommend using
    > EXIT_FAILURE, which would also force you to add "#include <stdlib.h>"
    > (which is required for the exit() function anyway).
    >
    >> }
    >> if ( (outfile = fopen("Clean_file", "w+")) == NULL)

    >
    > You opened the input file in binary mode, "rb", which seems correct,
    > but you opened the output file in text mode *and* update mode, even
    > though you only write to it. For consistency, use "wb" (write-only,
    > binary mode).


    I won't argue the advantages of reading and writing cleanly, although you
    are certainly correct here. I'm just trying to pound out something that
    will work - more concept than production code. Although I will take your
    suggestions to heart.

    >
    >> {
    >> printf("Cannot open output file\n");
    >> exit(1);
    >> }
    >> while ((c=fgetc(infile)) != EOF )
    >> {
    >> if(c == 0x1a) /* This is where I'm having a problem */
    >> /* if(c == '\0x1a') This fails with compiler error - more than
    >> one character defined for type char */

    >
    > 0x1a should work. '\x1a' is equivalent and probably clearer.


    Okay, I see now where I went wrong. \0 is for octal representation than
    hex. Let me go back and try the '\x1a` and see if I do better.


    >
    > (A compiler *could* accept '\0x1a', but it does't mean what you think
    > it means. The \0 represents a null character, and it's followed by
    > characters 'x', '1', and 'a'. Multi-character character literals are
    > legal, but their meaning is implementation-defined; they're hardly
    > ever useful.)
    >
    >> {
    >> c='_'; /*replace bad control char with something innocuous */
    >> }
    >> fputs(c,outfile);
    >> }
    >> fclose(infile);
    >> fclose(outfile);

    >
    > "return 0;" or "exit(0);".
    >
    >> }
    >> Yeah it's a pretty primitive code, but I'm more interested in getting
    >> the basics working before I go in and optimize the way it handles the
    >> input file. This compiles on xlC and HP's ANSI C compilers.

    >
    > <OT>Since you're using Unix-like systems, "man tr".</OT>


    Actually `tr` absolutely doesn't work here, the ^Z is its death (same with
    sed, batch VI, and a host of other shell related commands), but I won't
    discuss that here. Besides I will at some point need to port this to
    Windoze.

    >
    >> In the first if statement dealing with the ^Z, the program doesn't
    >> detect the control characters in the file,

    >
    > I don't know why it would cause that problem. I suspect you may be
    > misinterpreting the symptoms, but it's hard to tell.


    Agreed it's hard to diagnose code over a newsgroup. In the code I have
    working I put a puts() statement in the if branch to output whenever the
    conditional was met. When I use the 0x1a notation the if conditional is
    never accessed, although the program completes normally.

    >
    >> in the second statement the
    >> compiler complains about syntax. If I set c as typecast char, it
    >> finds the control characters, replaces them, but then blows away the
    >> EOF character and nukes the file.
    >>
    >> I have the suspicion that its the way I'm defining the c==\0x1a that
    >> is leading my astray here. I can't find any good consistent
    >> documentation on exactly how to represent hex or octal in c code or
    >> string/character operations.

    >
    > Really? Any decent C reference should explain that. If nothing else,
    > you can get the latest draft of the C standard at
    > <http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1124.pdf>; see
    > sections 6.4.4.4 and 6.4.5.
    >
    > --
    > Keith Thompson (The_Other_Keith)
    > <http://www.ghoti.net/~kst>
    > San Diego Supercomputer Center <*>
    > <http://users.sdsc.edu/~kst>
    > "We must do something. This is something. Therefore, we must do this."
    > -- Antony Jay and Jonathan Lynn, "Yes Minister"
     
    Eigenvector, May 23, 2007
    #6
  7. Eigenvector

    Eigenvector Guest

    "Old Wolf" <> wrote in message
    news:...
    > On May 23, 12:34 pm, "Eigenvector" <> wrote:
    >> if ( (infile = fopen(argv[1], "rb") == NULL) /*picked the binary part
    >> up
    >> from this google group*/
    >> {
    >> printf("Cannot open file\n");
    >> exit(1);
    >> }
    >> if ( (outfile = fopen("Clean_file", "w+")) == NULL)

    >
    > Open mode should be "w+b". You want to write it the
    > same way you read it.
    >
    >> if(c == 0x1a) /* This is where I'm having a problem */
    >> /* if(c == '\0x1a') This fails with compiler error - more than
    >> one
    >> character defined for type char */

    >
    > There are four characters in that constant: '\0', 'x',
    > '1', and 'a'. I think you mean '\x1a', although the
    > uncommented code is also correct and does the same thing.
    >
    >> Yeah it's a pretty primitive code, but I'm more interested in getting the
    >> basics working before I go in and optimize the way it handles the input
    >> file. This compiles on xlC and HP's ANSI C compilers.

    >
    > Out of interest, how were you planning on optimising
    > this? (I think you'll find that reading in a block
    > at a time won't gain you anything).
    >
    >> In the first if statement dealing with the ^Z, the program doesn't detect
    >> the control characters in the file, in the second statement the compiler
    >> complains about syntax. If I set c as typecast char, it finds the
    >> control
    >> characters, replaces them, but then blows away the EOF character and
    >> nukes
    >> the file.

    >
    > It doesn't seem possible that your posted code won't
    > find the 0x1a characters. There must be some other
    > problem, e.g. this isn't your real code, or the
    > non-binary output is munging up.


    This is the real code, albeit definitely primitive. I would have thought it
    would have found the ^Zs too, but on a SINGLE platform it doesn't. I trust
    that the platform is ANSI compliant, so that tells me that my problems lie
    with the code ultimately.

    >
    >> I can't find any good consistent documentation on exactly
    >> how to represent hex or octal in c code or string/character
    >> operations.

    >
    > Try the C Standard, or "The C Programming Language"
    > by Kernighan & Ritchie.
    >
     
    Eigenvector, May 23, 2007
    #7
  8. "Eigenvector" <> writes:
    > "Keith Thompson" <> wrote in message
    > news:...

    [...]
    >> <OT>Since you're using Unix-like systems, "man tr".</OT>

    >
    > Actually `tr` absolutely doesn't work here, the ^Z is its death (same with
    > sed, batch VI, and a host of other shell related commands), but I won't
    > discuss that here. Besides I will at some point need to port this to
    > Windoze.

    [...]

    Since your original program *should* have worked, and since the "tr"
    command does work for me, I'm beginning to suspect that the characters
    in your file aren't what you think they are.

    The "^Z" character is 26 decimal, '\032' octal, or '\x1a' hexadecimal
    (that's ASCII-specific). How do you know that's what's in your file?

    This works for me on a Unix system:

    tr '\032' _ < tmp.txt

    (the quotation marks are necessary).

    Try writing a program that prints (say, in decimal) the value of any
    non-printable character; you can use the isprint() function, declared
    in <ctype.h>. You can also exclude '\n' characters. If you get
    values other than 26 (154, maybe?), that's probably your problem.

    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
    "We must do something. This is something. Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
     
    Keith Thompson, May 23, 2007
    #8
  9. "Eigenvector" <> schrieb im Newsbeitrag
    news:...
    >
    > "Old Wolf" <> wrote in message
    > news:...
    >> On May 23, 12:34 pm, "Eigenvector" <> wrote:
    >>> if ( (infile = fopen(argv[1], "rb") == NULL) /*picked the binary
    >>> part up
    >>> from this google group*/
    >>> {
    >>> printf("Cannot open file\n");
    >>> exit(1);
    >>> }
    >>> if ( (outfile = fopen("Clean_file", "w+")) == NULL)

    >>
    >> Open mode should be "w+b". You want to write it the
    >> same way you read it.
    >>
    >>> if(c == 0x1a) /* This is where I'm having a problem */
    >>> /* if(c == '\0x1a') This fails with compiler error - more than
    >>> one
    >>> character defined for type char */

    >>
    >> There are four characters in that constant: '\0', 'x',
    >> '1', and 'a'. I think you mean '\x1a', although the
    >> uncommented code is also correct and does the same thing.
    >>
    >>> Yeah it's a pretty primitive code, but I'm more interested in getting
    >>> the
    >>> basics working before I go in and optimize the way it handles the input
    >>> file. This compiles on xlC and HP's ANSI C compilers.

    >>
    >> Out of interest, how were you planning on optimising
    >> this? (I think you'll find that reading in a block
    >> at a time won't gain you anything).
    >>
    >>> In the first if statement dealing with the ^Z, the program doesn't
    >>> detect
    >>> the control characters in the file, in the second statement the compiler
    >>> complains about syntax. If I set c as typecast char, it finds the
    >>> control
    >>> characters, replaces them, but then blows away the EOF character and
    >>> nukes
    >>> the file.

    >>
    >> It doesn't seem possible that your posted code won't
    >> find the 0x1a characters. There must be some other
    >> problem, e.g. this isn't your real code, or the
    >> non-binary output is munging up.

    >
    > This is the real code, albeit definitely primitive. I would have thought
    > it would have found the ^Zs too, but on a SINGLE platform it doesn't. I
    > trust that the platform is ANSI compliant, so that tells me that my
    > problems lie with the code ultimately.

    ^Z typically is used for Job-Control, in Shells that support it, to suspend
    the forground job, so the shell would consum it before your program gets a
    chance... check the output of 'stty -a'

    This is of course OT here and also I might be utterly wrong...

    Bye, Jojo
     
    Joachim Schmitz, May 23, 2007
    #9
  10. "Joachim Schmitz" <> writes:
    [...]
    > ^Z typically is used for Job-Control, in Shells that support it, to suspend
    > the forground job, so the shell would consum it before your program gets a
    > chance... check the output of 'stty -a'
    >
    > This is of course OT here and also I might be utterly wrong...


    That doesn't apply when reading from a file, as the OP is doing.

    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
    "We must do something. This is something. Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
     
    Keith Thompson, May 23, 2007
    #10
  11. Eigenvector

    Eigenvector Guest

    "Keith Thompson" <> wrote in message
    news:...
    > "Eigenvector" <> writes:
    >> "Keith Thompson" <> wrote in message
    >> news:...

    > [...]
    >>> <OT>Since you're using Unix-like systems, "man tr".</OT>

    >>
    >> Actually `tr` absolutely doesn't work here, the ^Z is its death (same
    >> with
    >> sed, batch VI, and a host of other shell related commands), but I won't
    >> discuss that here. Besides I will at some point need to port this to
    >> Windoze.

    > [...]
    >
    > Since your original program *should* have worked, and since the "tr"
    > command does work for me, I'm beginning to suspect that the characters
    > in your file aren't what you think they are.
    >
    > The "^Z" character is 26 decimal, '\032' octal, or '\x1a' hexadecimal
    > (that's ASCII-specific). How do you know that's what's in your file?
    >
    > This works for me on a Unix system:
    >
    > tr '\032' _ < tmp.txt
    >
    > (the quotation marks are necessary).
    >
    > Try writing a program that prints (say, in decimal) the value of any
    > non-printable character; you can use the isprint() function, declared
    > in <ctype.h>. You can also exclude '\n' characters. If you get
    > values other than 26 (154, maybe?), that's probably your problem.
    >
    > --
    > Keith Thompson (The_Other_Keith)
    > <http://www.ghoti.net/~kst>



    Well thanks for the tips all. I believe the solution was how I was defining
    \x1a. Once I got the syntax on it correct using '\x1a' the code on that
    remaining system worked.

    Frankly I'm stunned at how fast the program works on the the huge files I
    have to process - much much faster than the built-in OS code.
     
    Eigenvector, May 24, 2007
    #11
  12. "Eigenvector" <> writes:
    [...]
    > Well thanks for the tips all. I believe the solution was how I was defining
    > \x1a. Once I got the syntax on it correct using '\x1a' the code on that
    > remaining system worked.


    That's very surprising. The code you originally posted used the
    integer constant 0x1a (without quotation marks), which should have
    worked. <OT>The "tr" command should also have worked for you.</OT>
    There were some other problems in your code (which were already
    pointed out), but I don't think any of them should have prevented it
    from working.

    But in any case, I'm glad your problem is solved.

    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
    "We must do something. This is something. Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
     
    Keith Thompson, May 24, 2007
    #12
  13. Eigenvector

    Bill Latvin Guest

    On Tue, 22 May 2007 17:34:41 -0700, "Eigenvector"
    <> wrote:

    >I've been surfing the FAQ and Google for about a week and haven't quite
    >figured out this one.
    >
    >I have a file that changes on a periodic basis and every once and a while
    >^Zs will appear in the file for reasons I don't want to get into. I need to
    >get rid of those ^Z's and need to do it via a C code as it is the only tool
    >available to me that can handle the file size.
    >
    >So I cooked up some code, tried it out on one platform - and it works great,
    >it doesn't work so great on another and I am trying to understand why. I
    >did my best to code standard but perhaps that is where I'm failing.
    >
    >#include <stdio.h>
    >int main(int argc, char *argv[])
    >{
    > FILE *infile, *outfile;
    > int c; /*picked that up from the FAQ */
    > if ( (infile = fopen(argv[1], "rb") == NULL) /*picked the binary part up
    >from this google group*/
    > {
    > printf("Cannot open file\n");
    > exit(1);
    > }
    > if ( (outfile = fopen("Clean_file", "w+")) == NULL)
    > {
    > printf("Cannot open output file\n");
    > exit(1);
    > }
    > while ((c=fgetc(infile)) != EOF )
    > {
    > if(c == 0x1a) /* This is where I'm having a problem */
    > /* if(c == '\0x1a') This fails with compiler error - more than one
    >character defined for type char */
    > {
    > c='_'; /*replace bad control char with something innocuous */
    > }
    > fputs(c,outfile);
    > }
    > fclose(infile);
    > fclose(outfile);
    >}
    >Yeah it's a pretty primitive code, but I'm more interested in getting the
    >basics working before I go in and optimize the way it handles the input
    >file. This compiles on xlC and HP's ANSI C compilers.
    >
    >In the first if statement dealing with the ^Z, the program doesn't detect
    >the control characters in the file, in the second statement the compiler
    >complains about syntax. If I set c as typecast char, it finds the control
    >characters, replaces them, but then blows away the EOF character and nukes
    >the file.
    >
    >I have the suspicion that its the way I'm defining the c==\0x1a that is
    >leading my astray here. I can't find any good consistent documentation on
    >exactly how to represent hex or octal in c code or string/character
    >operations.
    >


    It didn't compile cleanly with xlC for me until I changed this:

    if ( (infile = fopen(argv[1], "rb") == NULL)
    to this:
    if ( (infile = fopen(argv[1], "rb") ) == NULL)

    and this:
    fputs(c,outfile);
    to this:
    fputc(c,outfile);

    Then it compiled, and worked correctly.

    Bill
     
    Bill Latvin, May 24, 2007
    #13
  14. Eigenvector

    Eigenvector Guest

    "Bill Latvin" <> wrote in message
    news:...
    > On Tue, 22 May 2007 17:34:41 -0700, "Eigenvector"
    > <> wrote:
    >
    >>I've been surfing the FAQ and Google for about a week and haven't quite
    >>figured out this one.
    >>
    >>I have a file that changes on a periodic basis and every once and a while
    >>^Zs will appear in the file for reasons I don't want to get into. I need
    >>to
    >>get rid of those ^Z's and need to do it via a C code as it is the only
    >>tool
    >>available to me that can handle the file size.
    >>
    >>So I cooked up some code, tried it out on one platform - and it works
    >>great,
    >>it doesn't work so great on another and I am trying to understand why. I
    >>did my best to code standard but perhaps that is where I'm failing.
    >>
    >>#include <stdio.h>
    >>int main(int argc, char *argv[])
    >>{
    >> FILE *infile, *outfile;
    >> int c; /*picked that up from the FAQ */
    >> if ( (infile = fopen(argv[1], "rb") == NULL) /*picked the binary part
    >> up
    >>from this google group*/
    >> {
    >> printf("Cannot open file\n");
    >> exit(1);
    >> }
    >> if ( (outfile = fopen("Clean_file", "w+")) == NULL)
    >> {
    >> printf("Cannot open output file\n");
    >> exit(1);
    >> }
    >> while ((c=fgetc(infile)) != EOF )
    >> {
    >> if(c == 0x1a) /* This is where I'm having a problem */
    >> /* if(c == '\0x1a') This fails with compiler error - more than
    >> one
    >>character defined for type char */
    >> {
    >> c='_'; /*replace bad control char with something innocuous */
    >> }
    >> fputs(c,outfile);
    >> }
    >> fclose(infile);
    >> fclose(outfile);
    >>}
    >>Yeah it's a pretty primitive code, but I'm more interested in getting the
    >>basics working before I go in and optimize the way it handles the input
    >>file. This compiles on xlC and HP's ANSI C compilers.
    >>
    >>In the first if statement dealing with the ^Z, the program doesn't detect
    >>the control characters in the file, in the second statement the compiler
    >>complains about syntax. If I set c as typecast char, it finds the control
    >>characters, replaces them, but then blows away the EOF character and nukes
    >>the file.
    >>
    >>I have the suspicion that its the way I'm defining the c==\0x1a that is
    >>leading my astray here. I can't find any good consistent documentation on
    >>exactly how to represent hex or octal in c code or string/character
    >>operations.
    >>

    >
    > It didn't compile cleanly with xlC for me until I changed this:
    >
    > if ( (infile = fopen(argv[1], "rb") == NULL)
    > to this:
    > if ( (infile = fopen(argv[1], "rb") ) == NULL)
    >
    > and this:
    > fputs(c,outfile);
    > to this:
    > fputc(c,outfile);
    >
    > Then it compiled, and worked correctly.
    >
    > Bill



    I apologize for that, I typed it in incorrectly. My code sheet says fputc
    not fputs - sorry for any confusion that caused. I don't have the ability
    to cut and paste code from this particular system, I have to rely on
    transcription.
     
    Eigenvector, May 24, 2007
    #14
  15. In article <>,
    CBFalconer <> wrote:
    >Eigenvector wrote:


    >> Actually `tr` absolutely doesn't work here, the ^Z is its death
    >> (same with sed, batch VI, and a host of other shell related
    >> commands), but I won't discuss that here. Besides I will at some
    >> point need to port this to Windoze.


    >Then you should probably be using EOF. The Unix/Linux equivalent
    >of ^Z is ^D. Depends on where your input originates.


    Unix/Linux are not hardwired to ^D; that's merely the most common
    defaults. The technical details of adjusting the end-of-file
    character are off-topic for this newsgroup though.
    --
    Prototypes are supertypes of their clones. -- maplesoft
     
    Walter Roberson, May 24, 2007
    #15
  16. CBFalconer <> writes:
    > Eigenvector wrote:
    >>

    > ... snip ...
    >>
    >> Actually `tr` absolutely doesn't work here, the ^Z is its death
    >> (same with sed, batch VI, and a host of other shell related
    >> commands), but I won't discuss that here. Besides I will at some
    >> point need to port this to Windoze.

    >
    > Then you should probably be using EOF. The Unix/Linux equivalent
    > of ^Z is ^D. Depends on where your input originates.


    Eigenvector was talking about ^Z characters (ASCII 26) in a file.
    Since he was opening the input file in binary mode, the character used
    to signal an end-of-file on interactive input should be irrelevant.

    I still don't know why he was having the problems he described, but
    he's since solved them.

    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
    "We must do something. This is something. Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
     
    Keith Thompson, May 24, 2007
    #16
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Thomas
    Replies:
    0
    Views:
    466
    Thomas
    Sep 7, 2005
  2. Mark A. Odell

    Q: Type'ing the infamous 'flags' field

    Mark A. Odell, Sep 16, 2004, in forum: C Programming
    Replies:
    13
    Views:
    492
    Dave Thompson
    Sep 23, 2004
  3. =?Utf-8?B?cm9kY2hhcg==?=

    infamous ie message

    =?Utf-8?B?cm9kY2hhcg==?=, Nov 12, 2007, in forum: ASP .Net
    Replies:
    4
    Views:
    302
    rodchar
    Nov 19, 2007
  4. Replies:
    2
    Views:
    442
    James Kanze
    May 23, 2009
  5. dsl
    Replies:
    0
    Views:
    312
Loading...

Share This Page