Noob File IO question

S

sore eyes

Hi
I just downloaded the free Watcom compiler and am having a little
trouble with File IO http://www.openwatcom.org/index.php/Download


I downloaded the following example, commented out the Command line
arguments so that I could debug more easily. The example is a simple
file copy. and it works. but I would like to customize this to
automate some redundant changes in some large files.

Here's the problem: When I debug this and watch the input buffer
buf[], I dont see a stream of characters instead I see a sequence
of integers. I suppose these are probably the chararters but every
other integer in the array is a 0. I am guessing the file uses 16 bit
characters and my program thinks they are 8 bit. I tried defining
UNICODE = 1 in my project setting but that didn't work.

My program will have to be able recognize sequences of characters
read from the file. Can someone tell me what I should do to
recognize the characters in the buffer?

------------------------------------------------------------------------------------
/
* stdc-file-copy.c - copy one file to a new location, possibly under a
* different name.
*/

#include <stdio.h> /* standard input/output routines.
*/

#define MAX_LINE_LEN 1000 /* maximal line length supported.
*/

/*
* function: main. copy the given source file to the given target
file.
* input: path to source file and path to target file.
* output: target file is being created with identical contents to
* source file.
*/
void
main(int argc, char* argv[])
{
char* file_path_from; /* path to source file. */
char* file_path_to; /* path to target file. */
FILE* f_from; /* stream of source file. */
FILE* f_to; /* stream of target file. */
char buf[MAX_LINE_LEN+1]; /* input buffer. */

/* read command line arguments */
/*
if (argc != 3 || !argv[1] || !argv[2]) {
fprintf(stderr, "Usage: %s <source file path> <target file
path>\n",
argv[0]);
exit(1);
}
file_path_from = argv[1];
file_path_to = argv[2];
*/
file_path_from = "newcode.html";
file_path_to = "filecopy.out";

/* open the source and the target files. */
f_from = fopen(file_path_from, "r");
if (!f_from) {
fprintf(stderr, "Cannot open source file: ");
perror("");
exit(1);
}
f_to = fopen(file_path_to, "w+");
if (!f_from) {
fprintf(stderr, "Cannot open target file: ");
perror("");
exit(1);
}

/* copy source file to target file, line by line. */
while (fgets(buf, MAX_LINE_LEN+1, f_from)) {
if (fputs(buf, f_to) == EOF) { /* error writing data */
fprintf(stderr, "Error writing to target file: ");
perror("");
exit(1);
}
}
if (!feof(f_from)) { /* fgets failed _not_ due to encountering EOF
*/
fprintf(stderr, "Error reading from source file: ");
perror("");
exit(1);
}

/* close source and target file streams. */
if (fclose(f_from) == EOF) {
fprintf(stderr, "Error when closing source file: ");
perror("");
}
if (fclose(f_to) == EOF) {
fprintf(stderr, "Error when closing target file: ");
perror("");
}
}
 
S

santosh

sore said:
Hi
I just downloaded the free Watcom compiler and am having a little
trouble with File IO http://www.openwatcom.org/index.php/Download

I downloaded the following example, commented out the Command line
arguments so that I could debug more easily. The example is a simple
file copy. and it works. but I would like to customize this to
automate some redundant changes in some large files.

Here's the problem: When I debug this and watch the input buffer
buf[], I dont see a stream of characters instead I see a sequence
of integers. I suppose these are probably the chararters but every
other integer in the array is a 0. I am guessing the file uses 16 bit
characters and my program thinks they are 8 bit. I tried defining
UNICODE = 1 in my project setting but that didn't work.

My program will have to be able recognize sequences of characters
read from the file. Can someone tell me what I should do to
recognize the characters in the buffer?

------------------------------------------------------------------------------------
/
* stdc-file-copy.c - copy one file to a new location, possibly under a
* different name.
*/

#include <stdio.h> /* standard input/output routines.*/

You'll also need stdlib.h for exit.
#define MAX_LINE_LEN 1000 /* maximal line length supported. */

If you did a character by character copying this wouldn't be a
restriction.
/*
* function: main. copy the given source file to the given target file.
* input: path to source file and path to target file.
* output: target file is being created with identical contents to
* source file.
*/
void
main(int argc, char* argv[])

Return type of main should be an int.
{
char* file_path_from; /* path to source file. */
char* file_path_to; /* path to target file. */
FILE* f_from; /* stream of source file. */
FILE* f_to; /* stream of target file. */
char buf[MAX_LINE_LEN+1]; /* input buffer. */

/* read command line arguments */
/*
if (argc != 3 || !argv[1] || !argv[2]) {

This is faulty test. Checking argc alone is sufficient. The concerned
strings, i.e. argv[1] and argv[2] should be checked seperately.
fprintf(stderr, "Usage: %s <source file path> <target file
path>\n",
argv[0]);
exit(1);

Anything other than 0, EXIT_SUCCESS and EXIT_FAILURE are not fully
portable return codes. stdlib.h declares exit as well as the two
EXIT_xx macros.
}
file_path_from = argv[1];
file_path_to = argv[2];
*/
file_path_from = "newcode.html";
file_path_to = "filecopy.out";

/* open the source and the target files. */
f_from = fopen(file_path_from, "r");
if (!f_from) {
fprintf(stderr, "Cannot open source file: ");
perror("");

fopen is guaranteed by the Standard to set errno to any sensible value
after failure. Also you're not including errno.h.
exit(1);
}
f_to = fopen(file_path_to, "w+");
if (!f_from) {
fprintf(stderr, "Cannot open target file: ");
perror("");
exit(1);
}

/* copy source file to target file, line by line. */
while (fgets(buf, MAX_LINE_LEN+1, f_from)) {
if (fputs(buf, f_to) == EOF) { /* error writing data */
fprintf(stderr, "Error writing to target file: ");
perror("");

Neither is fputs guaranteed by the Standard to set errno upon error.
exit(1);
}
}
if (!feof(f_from)) { /* fgets failed _not_ due to encountering EOF
*/
fprintf(stderr, "Error reading from source file: ");
perror("");

You should also call perror immediatly after the failing function.
Otherwise interveaning functions like fprintf here may themselves
alter errno and you might get spurious messages.
 
S

santosh

sore eyes wrote:

/
* stdc-file-copy.c - copy one file to a new location, possibly under a
* different name.
*/

#include <stdio.h> /* standard input/output routines. */

#define MAX_LINE_LEN 1000 /* maximal line length supported. */

/*
* function: main. copy the given source file to the given target file.
* input: path to source file and path to target file.
* output: target file is being created with identical contents to
* source file.
*/
void
main(int argc, char* argv[])
{

/* open the source and the target files. */
f_from = fopen(file_path_from, "r");
if (!f_from) {
fprintf(stderr, "Cannot open source file: ");
perror("");
exit(1);
}
f_to = fopen(file_path_to, "w+");
if (!f_from) {

You should check f_to here.

<snip rest>
 
S

santosh

santosh said:
sore eyes wrote:



fopen is guaranteed by the Standard to set errno to any sensible value
after failure. Also you're not including errno.h.

I meant to write:

fopen is *not* guaranteed by the Standard to set errno to any sensible
value
after failure. Also you're not including errno.h.

<snip>
 
S

sore eyes

As I mentioned in my first message, this is a downloaded example that
does indeed perform the intended function of copying a file. While I
do appreciate you pointing out all the flaws in the code, I wish you
would have attempted to address the problem that motivated me to
post the message.

As it is copying, I can watch the input buffer during debugging and
instead of seeing

buf[0]= 'r'
buf[1]= 'a
buf[2]='n'
buf[3]='d'
buf[4]='o'
buf[5]=m

I see a stream that looks typically like
buf[0]= 0
buf[1]= 71
buf[2] = 0
buf[3] = 76
buf[4] = 0
buf[5] = 79
buf[6] = 0
buf[7] = 84

As I mentioned before, I suspect that the file I opened has 16 bit
characters but the compiler/debugger is assuming 8 bit characters.
Do you agree that this is what is probably happening?

The intended target is an html file and I want to be able to replace
some bad tags during a copy so the ablity recognize and manipulate a
sequence of characters is important. As I mentioned before, I tried
setting UNICODE=1 in my project, but that didn't appear to affect
anything. Anyone know what I can do about this??
 
I

Ian Malone

As it is copying, I can watch the input buffer during debugging and
instead of seeing

buf[0]= 'r'
buf[1]= 'a
buf[2]='n'
buf[3]='d'
buf[4]='o'
buf[5]=m

I see a stream that looks typically like
buf[0]= 0
buf[1]= 71
buf[2] = 0
buf[3] = 76
buf[4] = 0
buf[5] = 79
buf[6] = 0
buf[7] = 84

As I mentioned before, I suspect that the file I opened has 16 bit
characters but the compiler/debugger is assuming 8 bit characters.
Do you agree that this is what is probably happening?

Okay, that does look like a 16 bit character encoding,
though I haven't checked that the numbers translate to
an existing encoding.
The intended target is an html file and I want to be able to replace
some bad tags during a copy so the ablity recognize and manipulate a
sequence of characters is important. As I mentioned before, I tried
setting UNICODE=1 in my project, but that didn't appear to affect
anything. Anyone know what I can do about this??


UNICODE=1 isn't going to do anything by itself, C
just reads a file into the buffer char by char.
If the encoding doesn't match the execution character
set then you'll have to deal with that somehow, which
involves determining the input encoding (if this is
html then a UTF-16 encoded variant of UCS would be a
good guess, the html spec defines ways to work it out).
C doesn't really deal with this[1], you may want to
see what international character support is available
on your platform, or just write support for the most
common variants (UTF-8 and UTF-16 UCS, but this will
make text processing more difficult).

From your original message it sounds like you want
a copy utility which does something clever on
encountering certain files. Practically speaking
it needs to be able to determine the file type (by
looking at the name, the contents or being told),
and understand enough of the format to make its
changes (in the case of XML et al. knowing the
spec and therefore how to work out the encoding
would be part of this).

[1] There is wchar, but whether it will do what you
want depends on the platform.
 
S

santosh

sore said:
As I mentioned in my first message, this is a downloaded example that
does indeed perform the intended function of copying a file. While I
do appreciate you pointing out all the flaws in the code, I wish you
would have attempted to address the problem that motivated me to
post the message.

As it is copying, I can watch the input buffer during debugging and
instead of seeing

buf[0]= 'r'
buf[1]= 'a
buf[2]='n'
buf[3]='d'
buf[4]='o'
buf[5]=m

I see a stream that looks typically like
buf[0]= 0
buf[1]= 71
buf[2] = 0
buf[3] = 76
buf[4] = 0
buf[5] = 79
buf[6] = 0
buf[7] = 84

As I mentioned before, I suspect that the file I opened has 16 bit
characters but the compiler/debugger is assuming 8 bit characters.
Do you agree that this is what is probably happening?

There's no point in agreeing since what's happening is system specific
and impossible to tell without further details. Did you check the
file's
encoding to see if it's actually 16 bit?
The intended target is an html file and I want to be able to replace
some bad tags during a copy so the ablity recognize and manipulate a
sequence of characters is important. As I mentioned before, I tried
setting UNICODE=1 in my project, but that didn't appear to affect
anything. Anyone know what I can do about this??

If the file is pure text and is under your manipulation try converting
it to UTF-8.
Otherwise you may have to change the locale for your C program and use
the
wide-character functions. What exactly need to be done is very much
dependent
on what your implementation actually supports as well as the
capabilities of the
underlying system. Maybe these links will help:

<http://evanjones.ca/unicode-in-c.html>
<http://www.cl.cam.ac.uk/~mgk25/unicode.html>
 
S

sore eyes

Thanks for the help Santosh and Ian

I've been able to work around the problem for the time being by:
1)copying the html file from my editor to the Windows clipboard
2)pasting the file into notepad
3) saving file as a txt file
4) renaming the txt file to my orginal html file name.
apparently that sequence translates the charactors into the 8bit
format that my program needs. I would prefer knowing the correct way
to handle the 16 bit characters but at least this will allow me to
get working again on the program's logic. I did try replacing char
with wchar but the Watcom compiler didn't recoginize that type.
Thanks again for the assitance.
 
B

Barry Schwarz

Hi
I just downloaded the free Watcom compiler and am having a little
trouble with File IO http://www.openwatcom.org/index.php/Download


I downloaded the following example, commented out the Command line
arguments so that I could debug more easily. The example is a simple
file copy. and it works. but I would like to customize this to
automate some redundant changes in some large files.

Here's the problem: When I debug this and watch the input buffer
buf[], I dont see a stream of characters instead I see a sequence
of integers. I suppose these are probably the chararters but every
other integer in the array is a 0. I am guessing the file uses 16 bit
characters and my program thinks they are 8 bit. I tried defining
UNICODE = 1 in my project setting but that didn't work.

How was newcode.html built? Have you looked at it with a hex editor
to see what it really contains?

Have you tried to create a simple text file whose contents you know
and test your program on that?


Remove del for email
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,743
Messages
2,569,478
Members
44,898
Latest member
BlairH7607

Latest Threads

Top