how to delete a character in a file ?

S

S!mb@

Hi all,

I'm currently developping a tool to convert texts files between linux,
windows and mac.

the end of a line is coded by 2 characters in windows, and only one in
unix & mac. So I have to delete a character at each end of a line.

car = fgetc(myFile);
while (car != EOF) {
if (car == 13) {
car2 = fgetc(myFile) ;
if (car2 == 10) {
// fseek of 2 characters
// delete a caracter
// overwrite the second caracter
}
}
}

how can I do that ? is there a function that I can use ? I can't find
one in stdio.h

thx in advance,

Jerem.
 
J

Jens.Toerring

S!mb@ said:
I'm currently developping a tool to convert texts files between linux,
windows and mac.
the end of a line is coded by 2 characters in windows, and only one in
unix & mac. So I have to delete a character at each end of a line.
car = fgetc(myFile);
while (car != EOF) {
if (car == 13) {

Better use '\r' instead of some "magic" values.
car2 = fgetc(myFile) ;
if (car2 == 10) {

And that would be '\n'. BTW, when you open the file in text mode
you may never "see" the '\r' and '\n' as two separate characters
if the "\r\n" combination is the end of line marker on the system.
// fseek of 2 characters
// delete a caracter
// overwrite the second caracter
how can I do that ? is there a function that I can use ? I can't find
one in stdio.h

See the FAQ, section 19.14. In short, you can't delete something from
the middle of a file, you have to copy everything except the stuff you
don't want to a new file.
Regards, Jens
 
F

Francois Grieu

Better use '\r' instead of some "magic" values.

For traditional MacOS compilers, '\r' tends to be 10,
and '\n' tends to be 13. This illustrates that when dealing
with binary files in a non-native format, it is best to use magic
values. OTOH, when dealign with local text files, '\n' is
best, of course.


François Grieu
 
M

Madhur Ahuja

S!mb@ said:
Hi all,

I'm currently developping a tool to convert texts files between linux,
windows and mac.

the end of a line is coded by 2 characters in windows, and only one in
unix & mac. So I have to delete a character at each end of a line.

car = fgetc(myFile);
while (car != EOF) {
if (car == 13) {
car2 = fgetc(myFile) ;
if (car2 == 10) {
// fseek of 2 characters
// delete a caracter
// overwrite the second caracter
}
}
}

how can I do that ? is there a function that I can use ? I can't find
one in stdio.h

thx in advance,

Jerem.

Well, there is already a tool, dos2unix and vice versa. Why reinvent the
wheel. Think something new.

--
Winners dont do different things, they do things differently.

Madhur Ahuja
India

Homepage : http://madhur.netfirms.com
Email : madhur<underscore>ahuja<at>yahoo<dot>com
 
J

Jens.Toerring

Francois Grieu said:
For traditional MacOS compilers, '\r' tends to be 10,
and '\n' tends to be 13. This illustrates that when dealing
with binary files in a non-native format, it is best to use magic
values. OTOH, when dealign with local text files, '\n' is
best, of course.

I don't believe that, they were also using ASCII. AFAIR on "classical"
MacOS the end of line marker was simply "\n\r" (i.e. the other way
round compared to DOSish systems), but that doesn't make '\r' (i.e. CR)
== 0xA and '\n' (LF) == 0xD.
Regards, Jens
 
A

Alan Balmer

Well, there is already a tool, dos2unix and vice versa. Why reinvent the
wheel. Think something new.

Didn't you just negate your own comment? <G>.

Maybe the OP is doing it differently.
 
P

Peter Nilsson

Francois Grieu said:
I don't believe that, they were also using ASCII.

Believe it, although it wasn't a hard and fast rule that Francois makes it out to be. Many
implementations (e.g. Metrowerks) allowed the programmer to optionally swap the values of
'\n' and '\r' for text streams. Choosing the '\n' == 0x0D meant that text streams where
unencomboured with eol translations.

The standard states that '\n' is an implementation defined value (whether on ASCII based
platforms or not) precisely for support of such systems.

[OT: That said, third party mac compilers had no support for command line arguments, since
Apple's MPW was the only environment that actually provided the notion of a 'shell'. So
compilers were not exactly conforming in the strictest sense.

Compiling command line programs generally involved including a ccommand(&argv) call from
main. Curiously, every development tool that I used (I've never used MPW) got the runtime
startup for command line programs 'wrong' since a main signature of...

int main(int argc, char **argv)

....invariably meant that argc and argv were located below the stack. (The int was returned
in register D0, so that didn't matter.) Fortunately the memory was the top of the
'application globals', a location 'reserved' by apple, but never used AFAIK!]
AFAIR on "classical"
MacOS the end of line marker was simply "\n\r"

The end of line marker was a lone <CR> (0x0D).

I have no idea whether Mac OS X uses linux (<LF> 0x10) linebreaks or not.
 
G

Gordon Burditt

I'm currently developping a tool to convert texts files between linux,
windows and mac.

the end of a line is coded by 2 characters in windows, and only one in
unix & mac. So I have to delete a character at each end of a line.

The portable way to make such changes is to copy the file and
make changes as you go. There is no portable way to shorten a file
to a length greater than zero except by truncating it to zero length
and then writing new contents for it. Functions such as ftruncate(),
chsize(), and suck() are not portable ANSI C.

Making changes in-place in a file should be done carefully. If
your program crashes partway through, it may leave an unrecoverable
mess.
car = fgetc(myFile);
while (car != EOF) {
if (car == 13) {
car2 = fgetc(myFile) ;
if (car2 == 10) {
// fseek of 2 characters
// delete a caracter
// overwrite the second caracter
}
}
}

how can I do that ? is there a function that I can use ? I can't find
one in stdio.h

A function which deletes a character out of a gigabyte file by
copying all but one character of the file may run very slowly
(although it is possible to write such a function portably if you've
got space for a copy of the file). If it's called once per line,
it could get REALLY, REALLY slow.

Gordon L Burditt
 
G

Gordon Burditt

The end of line marker was a lone said:
I have no idea whether Mac OS X uses linux (<LF> 0x10) linebreaks or not.

It does, although I prefer to call them UNIX linebrreaks.

Gordon L. Burditt
 
S

S!mb@

Gordon said:
It does, although I prefer to call them UNIX linebrreaks.

Gordon L. Burditt

ok ;)

and what about the others caracters on OS X ?
I mean caracters between 128 and 255. Do they use the Unix or the Mac
codage ?

i.e. £ is 0xA3 (163) on mac and 0x9C (156) on unix. What about on OS X ?
 
R

Richard Bos

Peter Nilsson said:
[OT: That said, third party mac compilers had no support for command line arguments, since
Apple's MPW was the only environment that actually provided the notion of a 'shell'. So
compilers were not exactly conforming in the strictest sense.

There's no reason why not having a command line would make an
implementation non-conforming. It would mean that the first argument to
main() would always be 0 or 1, but that's all.
Compiling command line programs generally involved including a ccommand(&argv) call from
main.

That, however, _would_ make it non-conforming.

Richard
 
S

S!mb@

Well, there is already a tool, dos2unix and vice versa. Why reinvent the
wheel. Think something new.

I had a look on google to find a tool. But I didn't find interesting one.
Most of them only convert LF and CR characters, but I need to convert
also characters above 128. I also need the source code to adapt the
interface to my program.

But if you know well coded and powerful tools, I am interested.

Jerem.
 
J

Jens.Toerring

I had a look on google to find a tool. But I didn't find interesting one.
Most of them only convert LF and CR characters, but I need to convert
also characters above 128. I also need the source code to adapt the
interface to my program.

That's not as simple as you seem to imagine - there are several different
standards (plus an even larger set of non-standard) interpretations for
the characters in that range. Just do a google search for e.g. "iso-8859"
to see just a few ways that range has been used. And there already exists
a tool for that purpose, it's called "recode".

Regards, Jens
 
P

Peter Nilsson

S!mb@ said:
ok ;)

and what about the others caracters on OS X ?

What about them?
I mean caracters between 128 and 255. Do they use the Unix or the Mac
codage ?

They use whatever coding the program that wrote them used.
i.e. £ is 0xA3 (163) on mac and 0x9C (156) on unix. What about on OS X ?

Either system would be (and I presume is) capable of interpreting the given text file
under a given charset. Even within a C implementation you may be able to switch between
locales to interpret the same file differently under two different codings.
 
P

Peter Nilsson

Richard Bos said:
Peter Nilsson said:
[OT: That said, third party mac compilers had no support for command
line arguments, since Apple's MPW was the only environment that
actually provided the notion of a 'shell'. So compilers were not
exactly conforming in the strictest sense.

There's no reason why not having a command line would make an
implementation non-conforming. It would mean that the first argument to
main() would always be 0 or 1, but that's all.

But the implementations I used didn't support that signature for main, what
you got for argc and argv was unspecified!
That, however, _would_ make it non-conforming.

The call would make a program not _strictly_ conforming, although it may be
(and was) conforming. The behaviour of such programs says nothing about the
_implementation's_ conformance.
 
R

Richard Bos

Peter Nilsson said:
Richard Bos said:
Peter Nilsson said:
[OT: That said, third party mac compilers had no support for command
line arguments, since Apple's MPW was the only environment that
actually provided the notion of a 'shell'. So compilers were not
exactly conforming in the strictest sense.

There's no reason why not having a command line would make an
implementation non-conforming. It would mean that the first argument to
main() would always be 0 or 1, but that's all.

But the implementations I used didn't support that signature for main, what
you got for argc and argv was unspecified!

Ah, but that's a different matter. If int main(int argc, char **argv) is
not supported, _that_ does mean that the implementation does not conform
to the Standard, at least if it claims to be a hosted implementation.
But not having a command line doesn't make this inevitable.
The call would make a program not _strictly_ conforming, although it may be
(and was) conforming. The behaviour of such programs says nothing about the
_implementation's_ conformance.

Well, yes, it does; ccommand is reserved for the programmer, not for the
implementation.

Richard
 
S

S!mb@

And that would be '\n'. BTW, when you open the file in text mode
you may never "see" the '\r' and '\n' as two separate characters
if the "\r\n" combination is the end of line marker on the system.

When I use an hexadecimal editor, I "see" both characters.
that's why my program tries to read 2 characters (with 2 fgetc).

in fact, this works perfectly on linux (compiled with gcc), but on
windows (with Borland bcc32 compiler), my program doesn't detect \r\n as
two separate characters, as you told me.

So... how can I detect "\r\n", the EOL in windows ?

Jerem
 
J

Jens.Toerring

When I use an hexadecimal editor, I "see" both characters.
that's why my program tries to read 2 characters (with 2 fgetc).
in fact, this works perfectly on linux (compiled with gcc), but on
windows (with Borland bcc32 compiler), my program doesn't detect \r\n as
two separate characters, as you told me.
So... how can I detect "\r\n", the EOL in windows ?

On Windows, when you have opened the file in text mode, the "\r\n"
sequence will be returned as a single '\n' because in text mode it
signifies the EOL - and in order to make dealing with text files as
portable as possible the C functions return a '\n' for whatever
the the EOL character or charcter sequence is on the system the
program is running on (as long as the file has been opened in text
mode). So, the obvious solution is to open the file in binary mode
(i.e. with "rb" as the second argument to fopen() when you want t
open the file for reading) whenenver you need to see what's really
in the file without some handling of special characters (the char-
acter with the numeric equivalent of 0x1A is another of such char-
acters that have a special meaning for text files on Windows).

The "problem" does not seem to exist for you on Linux because there
the character signifying an EOL is identical to the '\n' the C
functions are returning, so on Linux (and other Unices) there isn't
any difference between opening a file in text or binary mode.

Regards, Jens
 
F

Francois Grieu

I don't believe that, they were also using ASCII. AFAIR on "classical"
MacOS the end of line marker was simply "\n\r" (i.e. the other way
round compared to DOSish systems), but that doesn't make '\r' (i.e. CR)
== 0xA and '\n' (LF) == 0xD.

OT: I am 100% positive that traditional MacOS (up to and including
MacOS9) use the byte with value 13 to separate text line, with no 10.
You can check that yhis is the encoding is e.g.
<ftp://ftp.apple.com/developer/+LICENSE_READ_ME_FIRST>
This is how the traditional MacOS version of gzip decompresses text files.
This is the encoding used by e.g. Teachtext and Simpletext, and all
versions of Microsoft Word when dealing with text files, and..

Getting back on topic: Apple's own C comilers, part of MPW Shell, indeed
defines '\n' as 13, and '\r' as 10. This is NOT an option (contrary to
other compilers). This cause no porting problem with most code.

[OT: there are headaches when moving files across a network. The
worse is that for diacriticals such as eacute encoded on a byte, Apple
has used FOUR different encodings on the Apple2, Lisa, Traditional MacOS,
and MacOSX; and none of these is the same as in DOS].


François Grieu
 
O

Old Wolf

Peter Nilsson said:
Believe it, although it wasn't a hard and fast rule that Francois
makes it out to be. Many implementations (e.g. Metrowerks) allowed the
programmer to optionally swap the values of '\n' and '\r' for text
streams. Choosing the '\n' == 0x0D meant that text streams where
unencomboured with eol translations.

It sounds like you are describing conversion of '\n' to '\r' and vice
versa when a stream is open in text mode, which would be quite normal.
In fact it's the reason for having text mode and binary mode.

The OP claimed that '\r' was actually 10, ie. the following:

printf("%d\n", '\r');

would print 10. This is a totally different claim (which also
implies that the system is non-ASCII).
I'd have to see it to believe it..
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,055
Latest member
SlimSparkKetoACVReview

Latest Threads

Top