how to delete a character in a file ?

S!mb@ · Jul 19, 2004

Hi all,

I'm currently developping a tool to convert texts files between linux,
windows and mac.

the end of a line is coded by 2 characters in windows, and only one in
unix & mac. So I have to delete a character at each end of a line.

car = fgetc(myFile);
while (car != EOF) {
if (car == 13) {
car2 = fgetc(myFile) ;
if (car2 == 10) {
// fseek of 2 characters
// delete a caracter
// overwrite the second caracter
}
}
}

how can I do that ? is there a function that I can use ? I can't find
one in stdio.h

thx in advance,

Jerem.

Jens.Toerring · Jul 19, 2004

S!mb@ said:
I'm currently developping a tool to convert texts files between linux,
windows and mac.

the end of a line is coded by 2 characters in windows, and only one in
unix & mac. So I have to delete a character at each end of a line.

car = fgetc(myFile);
while (car != EOF) {
if (car == 13) {

Better use '\r' instead of some "magic" values.

car2 = fgetc(myFile) ;
if (car2 == 10) {

And that would be '\n'. BTW, when you open the file in text mode
you may never "see" the '\r' and '\n' as two separate characters
if the "\r\n" combination is the end of line marker on the system.

// fseek of 2 characters
// delete a caracter
// overwrite the second caracter

how can I do that ? is there a function that I can use ? I can't find
one in stdio.h

See the FAQ, section 19.14. In short, you can't delete something from
the middle of a file, you have to copy everything except the stuff you
don't want to a new file.
Regards, Jens

Francois Grieu · Jul 19, 2004

Better use '\r' instead of some "magic" values.

For traditional MacOS compilers, '\r' tends to be 10,
and '\n' tends to be 13. This illustrates that when dealing
with binary files in a non-native format, it is best to use magic
values. OTOH, when dealign with local text files, '\n' is
best, of course.

François Grieu

Madhur Ahuja · Jul 19, 2004

S!mb@ said:
Hi all,

I'm currently developping a tool to convert texts files between linux,
windows and mac.

the end of a line is coded by 2 characters in windows, and only one in
unix & mac. So I have to delete a character at each end of a line.

car = fgetc(myFile);
while (car != EOF) {
if (car == 13) {
car2 = fgetc(myFile) ;
if (car2 == 10) {
// fseek of 2 characters
// delete a caracter
// overwrite the second caracter
}
}
}

how can I do that ? is there a function that I can use ? I can't find
one in stdio.h

thx in advance,

Jerem.

Well, there is already a tool, dos2unix and vice versa. Why reinvent the
wheel. Think something new.

--
Winners dont do different things, they do things differently.

Madhur Ahuja
India

Homepage : http://madhur.netfirms.com
Email : madhur<underscore>ahuja<at>yahoo<dot>com

Jens.Toerring · Jul 19, 2004

Francois Grieu said:
For traditional MacOS compilers, '\r' tends to be 10,
and '\n' tends to be 13. This illustrates that when dealing
with binary files in a non-native format, it is best to use magic
values. OTOH, when dealign with local text files, '\n' is
best, of course.

Click to expand...

I don't believe that, they were also using ASCII. AFAIR on "classical"
MacOS the end of line marker was simply "\n\r" (i.e. the other way
round compared to DOSish systems), but that doesn't make '\r' (i.e. CR)
== 0xA and '\n' (LF) == 0xD.
Regards, Jens

Alan Balmer · Jul 19, 2004

Well, there is already a tool, dos2unix and vice versa. Why reinvent the
wheel. Think something new.

Didn't you just negate your own comment? <G>.

Maybe the OP is doing it differently.

Peter Nilsson · Jul 20, 2004

Francois Grieu said:
Francois Grieu said:

I don't believe that, they were also using ASCII.

Click to expand...

Believe it, although it wasn't a hard and fast rule that Francois makes it out to be. Many
implementations (e.g. Metrowerks) allowed the programmer to optionally swap the values of
'\n' and '\r' for text streams. Choosing the '\n' == 0x0D meant that text streams where
unencomboured with eol translations.

The standard states that '\n' is an implementation defined value (whether on ASCII based
platforms or not) precisely for support of such systems.

[OT: That said, third party mac compilers had no support for command line arguments, since
Apple's MPW was the only environment that actually provided the notion of a 'shell'. So
compilers were not exactly conforming in the strictest sense.

Compiling command line programs generally involved including a ccommand(&argv) call from
main. Curiously, every development tool that I used (I've never used MPW) got the runtime
startup for command line programs 'wrong' since a main signature of...

int main(int argc, char **argv)

....invariably meant that argc and argv were located below the stack. (The int was returned
in register D0, so that didn't matter.) Fortunately the memory was the top of the
'application globals', a location 'reserved' by apple, but never used AFAIK!]

AFAIR on "classical"
MacOS the end of line marker was simply "\n\r"

Click to expand...

The end of line marker was a lone <CR> (0x0D).

I have no idea whether Mac OS X uses linux (<LF> 0x10) linebreaks or not.

Gordon Burditt · Jul 20, 2004

I'm currently developping a tool to convert texts files between linux,

windows and mac.

the end of a line is coded by 2 characters in windows, and only one in
unix & mac. So I have to delete a character at each end of a line.

The portable way to make such changes is to copy the file and
make changes as you go. There is no portable way to shorten a file
to a length greater than zero except by truncating it to zero length
and then writing new contents for it. Functions such as ftruncate(),
chsize(), and suck() are not portable ANSI C.

Making changes in-place in a file should be done carefully. If
your program crashes partway through, it may leave an unrecoverable
mess.

car = fgetc(myFile);
while (car != EOF) {
if (car == 13) {
car2 = fgetc(myFile) ;
if (car2 == 10) {
// fseek of 2 characters
// delete a caracter
// overwrite the second caracter
}
}
}

how can I do that ? is there a function that I can use ? I can't find
one in stdio.h

A function which deletes a character out of a gigabyte file by
copying all but one character of the file may run very slowly
(although it is possible to write such a function portably if you've
got space for a copy of the file). If it's called once per line,
it could get REALLY, REALLY slow.

Gordon L Burditt

Gordon Burditt · Jul 20, 2004

The end of line marker was a lone said:
I have no idea whether Mac OS X uses linux (<LF> 0x10) linebreaks or not.

It does, although I prefer to call them UNIX linebrreaks.

Gordon L. Burditt

S!mb@ · Jul 20, 2004

Gordon said:
It does, although I prefer to call them UNIX linebrreaks.

Gordon L. Burditt

ok

and what about the others caracters on OS X ?
I mean caracters between 128 and 255. Do they use the Unix or the Mac
codage ?

i.e. £ is 0xA3 (163) on mac and 0x9C (156) on unix. What about on OS X ?

Richard Bos · Jul 20, 2004

Peter Nilsson said:
[OT: That said, third party mac compilers had no support for command line arguments, since
Apple's MPW was the only environment that actually provided the notion of a 'shell'. So
compilers were not exactly conforming in the strictest sense.

There's no reason why not having a command line would make an
implementation non-conforming. It would mean that the first argument to
main() would always be 0 or 1, but that's all.

Compiling command line programs generally involved including a ccommand(&argv) call from
main.

That, however, _would_ make it non-conforming.

Richard

S!mb@ · Jul 20, 2004

Well, there is already a tool, dos2unix and vice versa. Why reinvent the

wheel. Think something new.

I had a look on google to find a tool. But I didn't find interesting one.
Most of them only convert LF and CR characters, but I need to convert
also characters above 128. I also need the source code to adapt the
interface to my program.

But if you know well coded and powerful tools, I am interested.

Jerem.

Jens.Toerring · Jul 20, 2004

I had a look on google to find a tool. But I didn't find interesting one.
Most of them only convert LF and CR characters, but I need to convert
also characters above 128. I also need the source code to adapt the
interface to my program.

That's not as simple as you seem to imagine - there are several different
standards (plus an even larger set of non-standard) interpretations for
the characters in that range. Just do a google search for e.g. "iso-8859"
to see just a few ways that range has been used. And there already exists
a tool for that purpose, it's called "recode".

Regards, Jens

Peter Nilsson · Jul 20, 2004

S!mb@ said:
ok

and what about the others caracters on OS X ?

What about them?

I mean caracters between 128 and 255. Do they use the Unix or the Mac
codage ?

They use whatever coding the program that wrote them used.

i.e. £ is 0xA3 (163) on mac and 0x9C (156) on unix. What about on OS X ?

Either system would be (and I presume is) capable of interpreting the given text file
under a given charset. Even within a C implementation you may be able to switch between
locales to interpret the same file differently under two different codings.

Peter Nilsson · Jul 20, 2004

Richard Bos said:
Peter Nilsson said:

[OT: That said, third party mac compilers had no support for command
line arguments, since Apple's MPW was the only environment that
actually provided the notion of a 'shell'. So compilers were not
exactly conforming in the strictest sense.

Click to expand...

There's no reason why not having a command line would make an
implementation non-conforming. It would mean that the first argument to
main() would always be 0 or 1, but that's all.

But the implementations I used didn't support that signature for main, what
you got for argc and argv was unspecified!

That, however, _would_ make it non-conforming.

The call would make a program not _strictly_ conforming, although it may be
(and was) conforming. The behaviour of such programs says nothing about the
_implementation's_ conformance.

Richard Bos · Jul 20, 2004

Peter Nilsson said:
Richard Bos said:

Peter Nilsson said:

[OT: That said, third party mac compilers had no support for command
line arguments, since Apple's MPW was the only environment that
actually provided the notion of a 'shell'. So compilers were not
exactly conforming in the strictest sense.

Click to expand...

There's no reason why not having a command line would make an
implementation non-conforming. It would mean that the first argument to
main() would always be 0 or 1, but that's all.

Click to expand...

But the implementations I used didn't support that signature for main, what
you got for argc and argv was unspecified!

Ah, but that's a different matter. If int main(int argc, char **argv) is
not supported, _that_ does mean that the implementation does not conform
to the Standard, at least if it claims to be a hosted implementation.
But not having a command line doesn't make this inevitable.

The call would make a program not _strictly_ conforming, although it may be
(and was) conforming. The behaviour of such programs says nothing about the
_implementation's_ conformance.

Well, yes, it does; ccommand is reserved for the programmer, not for the
implementation.

Richard

S!mb@ · Jul 20, 2004

And that would be '\n'. BTW, when you open the file in text mode

you may never "see" the '\r' and '\n' as two separate characters
if the "\r\n" combination is the end of line marker on the system.

When I use an hexadecimal editor, I "see" both characters.
that's why my program tries to read 2 characters (with 2 fgetc).

in fact, this works perfectly on linux (compiled with gcc), but on
windows (with Borland bcc32 compiler), my program doesn't detect \r\n as
two separate characters, as you told me.

So... how can I detect "\r\n", the EOL in windows ?

Jerem

Jens.Toerring · Jul 20, 2004

When I use an hexadecimal editor, I "see" both characters.
that's why my program tries to read 2 characters (with 2 fgetc).

in fact, this works perfectly on linux (compiled with gcc), but on
windows (with Borland bcc32 compiler), my program doesn't detect \r\n as
two separate characters, as you told me.

So... how can I detect "\r\n", the EOL in windows ?

On Windows, when you have opened the file in text mode, the "\r\n"
sequence will be returned as a single '\n' because in text mode it
signifies the EOL - and in order to make dealing with text files as
portable as possible the C functions return a '\n' for whatever
the the EOL character or charcter sequence is on the system the
program is running on (as long as the file has been opened in text
mode). So, the obvious solution is to open the file in binary mode
(i.e. with "rb" as the second argument to fopen() when you want t
open the file for reading) whenenver you need to see what's really
in the file without some handling of special characters (the char-
acter with the numeric equivalent of 0x1A is another of such char-
acters that have a special meaning for text files on Windows).

The "problem" does not seem to exist for you on Linux because there
the character signifying an EOL is identical to the '\n' the C
functions are returning, so on Linux (and other Unices) there isn't
any difference between opening a file in text or binary mode.

Regards, Jens

Francois Grieu · Jul 20, 2004

I don't believe that, they were also using ASCII. AFAIR on "classical"
MacOS the end of line marker was simply "\n\r" (i.e. the other way
round compared to DOSish systems), but that doesn't make '\r' (i.e. CR)
== 0xA and '\n' (LF) == 0xD.

OT: I am 100% positive that traditional MacOS (up to and including
MacOS9) use the byte with value 13 to separate text line, with no 10.
You can check that yhis is the encoding is e.g.
<ftp://ftp.apple.com/developer/+LICENSE_READ_ME_FIRST>
This is how the traditional MacOS version of gzip decompresses text files.
This is the encoding used by e.g. Teachtext and Simpletext, and all
versions of Microsoft Word when dealing with text files, and..

Getting back on topic: Apple's own C comilers, part of MPW Shell, indeed
defines '\n' as 13, and '\r' as 10. This is NOT an option (contrary to
other compilers). This cause no porting problem with most code.

[OT: there are headaches when moving files across a network. The
worse is that for diacriticals such as eacute encoded on a byte, Apple
has used FOUR different encodings on the Apple2, Lisa, Traditional MacOS,
and MacOSX; and none of these is the same as in DOS].

François Grieu

Old Wolf · Jul 20, 2004

Peter Nilsson said:
Believe it, although it wasn't a hard and fast rule that Francois
makes it out to be. Many implementations (e.g. Metrowerks) allowed the
programmer to optionally swap the values of '\n' and '\r' for text
streams. Choosing the '\n' == 0x0D meant that text streams where
unencomboured with eol translations.

It sounds like you are describing conversion of '\n' to '\r' and vice
versa when a stream is open in text mode, which would be quite normal.
In fact it's the reason for having text mode and binary mode.

The OP claimed that '\r' was actually 10, ie. the following:

printf("%d\n", '\r');

would print 10. This is a totally different claim (which also
implies that the system is non-ASCII).
I'd have to see it to believe it..

Outputting signal values to terminal Within Character Array	0	Dec 10, 2021
how to delete a line in a text file	1	Feb 17, 2014
How to Create a random password generator in a separate window	4	May 26, 2022
How do I use Find and Loop in VBA for Excel to identify, delete, and insert blank row for values greater than 6?	0	Feb 28, 2022
Learn sRWD: How to Build a Quick Photo/Video Browser in 35 Minutes	0	Jul 20, 2023
How to sort a CSV file with merge sort JAVA	7	May 6, 2021
Convert string with control character in caret notation to realcontrol character string.	8	Sep 25, 2012
How to delete a ast character from a string?	0	Aug 29, 2008

how to delete a character in a file ?

S!mb@

Jens.Toerring

Francois Grieu

Madhur Ahuja

Jens.Toerring

Alan Balmer

Peter Nilsson

Gordon Burditt

Gordon Burditt

S!mb@

Richard Bos

S!mb@

Jens.Toerring

Peter Nilsson

Peter Nilsson

Richard Bos

S!mb@

Jens.Toerring

Francois Grieu

Old Wolf

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads