how to delete a character in a file ?

Discussion in 'C Programming' started by S!mb@, Jul 19, 2004.

  1. S!mb@

    S!mb@ Guest

    Hi all,

    I'm currently developping a tool to convert texts files between linux,
    windows and mac.

    the end of a line is coded by 2 characters in windows, and only one in
    unix & mac. So I have to delete a character at each end of a line.

    car = fgetc(myFile);
    while (car != EOF) {
    if (car == 13) {
    car2 = fgetc(myFile) ;
    if (car2 == 10) {
    // fseek of 2 characters
    // delete a caracter
    // overwrite the second caracter
    }
    }
    }

    how can I do that ? is there a function that I can use ? I can't find
    one in stdio.h

    thx in advance,

    Jerem.
     
    S!mb@, Jul 19, 2004
    #1
    1. Advertising

  2. S!mb@

    -berlin.de Guest

    S!mb@ <S!mb@nop> wrote:
    > I'm currently developping a tool to convert texts files between linux,
    > windows and mac.


    > the end of a line is coded by 2 characters in windows, and only one in
    > unix & mac. So I have to delete a character at each end of a line.


    > car = fgetc(myFile);
    > while (car != EOF) {
    > if (car == 13) {


    Better use '\r' instead of some "magic" values.

    > car2 = fgetc(myFile) ;
    > if (car2 == 10) {


    And that would be '\n'. BTW, when you open the file in text mode
    you may never "see" the '\r' and '\n' as two separate characters
    if the "\r\n" combination is the end of line marker on the system.

    > // fseek of 2 characters
    > // delete a caracter
    > // overwrite the second caracter


    > how can I do that ? is there a function that I can use ? I can't find
    > one in stdio.h


    See the FAQ, section 19.14. In short, you can't delete something from
    the middle of a file, you have to copy everything except the stuff you
    don't want to a new file.
    Regards, Jens
    --
    \ Jens Thoms Toerring ___ -berlin.de
    \__________________________ http://www.toerring.de
     
    -berlin.de, Jul 19, 2004
    #2
    1. Advertising

  3. In article <>, -berlin.de
    wrote:

    > S!mb@ <S!mb@nop> wrote:
    > > I'm currently developping a tool to convert texts files between linux,
    > > windows and mac.

    > (..)
    > > if (car == 13) {

    >
    > Better use '\r' instead of some "magic" values.


    For traditional MacOS compilers, '\r' tends to be 10,
    and '\n' tends to be 13. This illustrates that when dealing
    with binary files in a non-native format, it is best to use magic
    values. OTOH, when dealign with local text files, '\n' is
    best, of course.


    François Grieu
     
    Francois Grieu, Jul 19, 2004
    #3
  4. S!mb@

    Madhur Ahuja Guest

    "S!mb@" <S!mb@nop> wrote in message
    news:40fbe69d$0$15270$...
    > Hi all,
    >
    > I'm currently developping a tool to convert texts files between linux,
    > windows and mac.
    >
    > the end of a line is coded by 2 characters in windows, and only one in
    > unix & mac. So I have to delete a character at each end of a line.
    >
    > car = fgetc(myFile);
    > while (car != EOF) {
    > if (car == 13) {
    > car2 = fgetc(myFile) ;
    > if (car2 == 10) {
    > // fseek of 2 characters
    > // delete a caracter
    > // overwrite the second caracter
    > }
    > }
    > }
    >
    > how can I do that ? is there a function that I can use ? I can't find
    > one in stdio.h
    >
    > thx in advance,
    >
    > Jerem.


    Well, there is already a tool, dos2unix and vice versa. Why reinvent the
    wheel. Think something new.

    --
    Winners dont do different things, they do things differently.

    Madhur Ahuja
    India

    Homepage : http://madhur.netfirms.com
    Email : madhur<underscore>ahuja<at>yahoo<dot>com
     
    Madhur Ahuja, Jul 19, 2004
    #4
  5. S!mb@

    -berlin.de Guest

    Francois Grieu <> wrote:
    > In article <>, -berlin.de
    > wrote:


    >> S!mb@ <S!mb@nop> wrote:
    >> > I'm currently developping a tool to convert texts files between linux,
    >> > windows and mac.

    >> (..)
    >> > if (car == 13) {

    >>
    >> Better use '\r' instead of some "magic" values.


    > For traditional MacOS compilers, '\r' tends to be 10,
    > and '\n' tends to be 13. This illustrates that when dealing
    > with binary files in a non-native format, it is best to use magic
    > values. OTOH, when dealign with local text files, '\n' is
    > best, of course.


    I don't believe that, they were also using ASCII. AFAIR on "classical"
    MacOS the end of line marker was simply "\n\r" (i.e. the other way
    round compared to DOSish systems), but that doesn't make '\r' (i.e. CR)
    == 0xA and '\n' (LF) == 0xD.
    Regards, Jens
    --
    \ Jens Thoms Toerring ___ -berlin.de
    \__________________________ http://www.toerring.de
     
    -berlin.de, Jul 19, 2004
    #5
  6. S!mb@

    Alan Balmer Guest

    On Tue, 20 Jul 2004 00:06:11 +0530, "Madhur Ahuja" <> wrote:

    >Well, there is already a tool, dos2unix and vice versa. Why reinvent the
    >wheel. Think something new.
    >
    >--
    >Winners dont do different things, they do things differently.


    Didn't you just negate your own comment? <G>.

    Maybe the OP is doing it differently.

    --
    Al Balmer
    Balmer Consulting
     
    Alan Balmer, Jul 19, 2004
    #6
  7. <-berlin.de> wrote in message news:...
    > Francois Grieu <> wrote:
    > > In article <>, -berlin.de
    > > wrote:

    >
    > >> S!mb@ <S!mb@nop> wrote:
    > >> > I'm currently developping a tool to convert texts files between linux,
    > >> > windows and mac.
    > >> (..)
    > >> > if (car == 13) {
    > >>
    > >> Better use '\r' instead of some "magic" values.

    >
    > > For traditional MacOS compilers, '\r' tends to be 10,
    > > and '\n' tends to be 13. This illustrates that when dealing
    > > with binary files in a non-native format, it is best to use magic
    > > values. OTOH, when dealign with local text files, '\n' is
    > > best, of course.

    >
    > I don't believe that, they were also using ASCII.


    Believe it, although it wasn't a hard and fast rule that Francois makes it out to be. Many
    implementations (e.g. Metrowerks) allowed the programmer to optionally swap the values of
    '\n' and '\r' for text streams. Choosing the '\n' == 0x0D meant that text streams where
    unencomboured with eol translations.

    The standard states that '\n' is an implementation defined value (whether on ASCII based
    platforms or not) precisely for support of such systems.

    [OT: That said, third party mac compilers had no support for command line arguments, since
    Apple's MPW was the only environment that actually provided the notion of a 'shell'. So
    compilers were not exactly conforming in the strictest sense.

    Compiling command line programs generally involved including a ccommand(&argv) call from
    main. Curiously, every development tool that I used (I've never used MPW) got the runtime
    startup for command line programs 'wrong' since a main signature of...

    int main(int argc, char **argv)

    ....invariably meant that argc and argv were located below the stack. (The int was returned
    in register D0, so that didn't matter.) Fortunately the memory was the top of the
    'application globals', a location 'reserved' by apple, but never used AFAIK!]

    > AFAIR on "classical"
    > MacOS the end of line marker was simply "\n\r"


    The end of line marker was a lone <CR> (0x0D).

    I have no idea whether Mac OS X uses linux (<LF> 0x10) linebreaks or not.

    --
    Peter
     
    Peter Nilsson, Jul 20, 2004
    #7
  8. >I'm currently developping a tool to convert texts files between linux,
    >windows and mac.
    >
    >the end of a line is coded by 2 characters in windows, and only one in
    >unix & mac. So I have to delete a character at each end of a line.


    The portable way to make such changes is to copy the file and
    make changes as you go. There is no portable way to shorten a file
    to a length greater than zero except by truncating it to zero length
    and then writing new contents for it. Functions such as ftruncate(),
    chsize(), and suck() are not portable ANSI C.

    Making changes in-place in a file should be done carefully. If
    your program crashes partway through, it may leave an unrecoverable
    mess.

    >
    >car = fgetc(myFile);
    >while (car != EOF) {
    > if (car == 13) {
    > car2 = fgetc(myFile) ;
    > if (car2 == 10) {
    > // fseek of 2 characters
    > // delete a caracter
    > // overwrite the second caracter
    > }
    > }
    >}
    >
    >how can I do that ? is there a function that I can use ? I can't find
    >one in stdio.h


    A function which deletes a character out of a gigabyte file by
    copying all but one character of the file may run very slowly
    (although it is possible to write such a function portably if you've
    got space for a copy of the file). If it's called once per line,
    it could get REALLY, REALLY slow.

    Gordon L Burditt
     
    Gordon Burditt, Jul 20, 2004
    #8
  9. >The end of line marker was a lone <CR> (0x0D).
    >
    >I have no idea whether Mac OS X uses linux (<LF> 0x10) linebreaks or not.


    It does, although I prefer to call them UNIX linebrreaks.

    Gordon L. Burditt
     
    Gordon Burditt, Jul 20, 2004
    #9
  10. S!mb@

    S!mb@ Guest

    Gordon Burditt wrote:
    >>The end of line marker was a lone <CR> (0x0D).
    >>
    >>I have no idea whether Mac OS X uses linux (<LF> 0x10) linebreaks or not.

    >
    >
    > It does, although I prefer to call them UNIX linebrreaks.
    >
    > Gordon L. Burditt


    ok ;)

    and what about the others caracters on OS X ?
    I mean caracters between 128 and 255. Do they use the Unix or the Mac
    codage ?

    i.e. £ is 0xA3 (163) on mac and 0x9C (156) on unix. What about on OS X ?
     
    S!mb@, Jul 20, 2004
    #10
  11. S!mb@

    Richard Bos Guest

    "Peter Nilsson" <> wrote:

    > [OT: That said, third party mac compilers had no support for command line arguments, since
    > Apple's MPW was the only environment that actually provided the notion of a 'shell'. So
    > compilers were not exactly conforming in the strictest sense.


    There's no reason why not having a command line would make an
    implementation non-conforming. It would mean that the first argument to
    main() would always be 0 or 1, but that's all.

    > Compiling command line programs generally involved including a ccommand(&argv) call from
    > main.


    That, however, _would_ make it non-conforming.

    Richard
     
    Richard Bos, Jul 20, 2004
    #11
  12. S!mb@

    S!mb@ Guest

    > Well, there is already a tool, dos2unix and vice versa. Why reinvent the
    > wheel. Think something new.
    >


    I had a look on google to find a tool. But I didn't find interesting one.
    Most of them only convert LF and CR characters, but I need to convert
    also characters above 128. I also need the source code to adapt the
    interface to my program.

    But if you know well coded and powerful tools, I am interested.

    Jerem.
     
    S!mb@, Jul 20, 2004
    #12
  13. S!mb@

    -berlin.de Guest

    S!mb@ <S!mb@nop> wrote:
    > > Well, there is already a tool, dos2unix and vice versa. Why reinvent the
    >> wheel. Think something new.
    >>


    > I had a look on google to find a tool. But I didn't find interesting one.
    > Most of them only convert LF and CR characters, but I need to convert
    > also characters above 128. I also need the source code to adapt the
    > interface to my program.


    That's not as simple as you seem to imagine - there are several different
    standards (plus an even larger set of non-standard) interpretations for
    the characters in that range. Just do a google search for e.g. "iso-8859"
    to see just a few ways that range has been used. And there already exists
    a tool for that purpose, it's called "recode".

    Regards, Jens
    --
    \ Jens Thoms Toerring ___ -berlin.de
    \__________________________ http://www.toerring.de
     
    -berlin.de, Jul 20, 2004
    #13
  14. "S!mb@" <S!mb@nop> wrote in message news:40fcd04d$0$29419$...
    > Gordon Burditt wrote:
    > > > The end of line marker was a lone <CR> (0x0D).
    > > >
    > > > I have no idea whether Mac OS X uses linux (<LF> 0x10) linebreaks
    > > > or not.

    > >
    > > It does, although I prefer to call them UNIX linebrreaks.

    >
    > ok ;)
    >
    > and what about the others caracters on OS X ?


    What about them?

    > I mean caracters between 128 and 255. Do they use the Unix or the Mac
    > codage ?


    They use whatever coding the program that wrote them used.

    > i.e. £ is 0xA3 (163) on mac and 0x9C (156) on unix. What about on OS X ?


    Either system would be (and I presume is) capable of interpreting the given text file
    under a given charset. Even within a C implementation you may be able to switch between
    locales to interpret the same file differently under two different codings.

    --
    Peter
     
    Peter Nilsson, Jul 20, 2004
    #14
  15. "Richard Bos" <> wrote in message
    news:...
    > "Peter Nilsson" <> wrote:
    >
    > > [OT: That said, third party mac compilers had no support for command
    > > line arguments, since Apple's MPW was the only environment that
    > > actually provided the notion of a 'shell'. So compilers were not
    > > exactly conforming in the strictest sense.

    >
    > There's no reason why not having a command line would make an
    > implementation non-conforming. It would mean that the first argument to
    > main() would always be 0 or 1, but that's all.


    But the implementations I used didn't support that signature for main, what
    you got for argc and argv was unspecified!

    > > Compiling command line programs generally involved including a
    > > ccommand(&argv) call from main.

    >
    > That, however, _would_ make it non-conforming.


    The call would make a program not _strictly_ conforming, although it may be
    (and was) conforming. The behaviour of such programs says nothing about the
    _implementation's_ conformance.

    --
    Peter
     
    Peter Nilsson, Jul 20, 2004
    #15
  16. S!mb@

    Richard Bos Guest

    "Peter Nilsson" <> wrote:

    > "Richard Bos" <> wrote in message
    > news:...
    > > "Peter Nilsson" <> wrote:
    > >
    > > > [OT: That said, third party mac compilers had no support for command
    > > > line arguments, since Apple's MPW was the only environment that
    > > > actually provided the notion of a 'shell'. So compilers were not
    > > > exactly conforming in the strictest sense.

    > >
    > > There's no reason why not having a command line would make an
    > > implementation non-conforming. It would mean that the first argument to
    > > main() would always be 0 or 1, but that's all.

    >
    > But the implementations I used didn't support that signature for main, what
    > you got for argc and argv was unspecified!


    Ah, but that's a different matter. If int main(int argc, char **argv) is
    not supported, _that_ does mean that the implementation does not conform
    to the Standard, at least if it claims to be a hosted implementation.
    But not having a command line doesn't make this inevitable.

    > > > Compiling command line programs generally involved including a
    > > > ccommand(&argv) call from main.

    > >
    > > That, however, _would_ make it non-conforming.

    >
    > The call would make a program not _strictly_ conforming, although it may be
    > (and was) conforming. The behaviour of such programs says nothing about the
    > _implementation's_ conformance.


    Well, yes, it does; ccommand is reserved for the programmer, not for the
    implementation.

    Richard
     
    Richard Bos, Jul 20, 2004
    #16
  17. S!mb@

    S!mb@ Guest

    > And that would be '\n'. BTW, when you open the file in text mode
    > you may never "see" the '\r' and '\n' as two separate characters
    > if the "\r\n" combination is the end of line marker on the system.


    When I use an hexadecimal editor, I "see" both characters.
    that's why my program tries to read 2 characters (with 2 fgetc).

    in fact, this works perfectly on linux (compiled with gcc), but on
    windows (with Borland bcc32 compiler), my program doesn't detect \r\n as
    two separate characters, as you told me.

    So... how can I detect "\r\n", the EOL in windows ?

    Jerem
     
    S!mb@, Jul 20, 2004
    #17
  18. S!mb@

    -berlin.de Guest

    S!mb@ <S!mb@nop> wrote:
    >> And that would be '\n'. BTW, when you open the file in text mode
    >> you may never "see" the '\r' and '\n' as two separate characters
    >> if the "\r\n" combination is the end of line marker on the system.


    > When I use an hexadecimal editor, I "see" both characters.
    > that's why my program tries to read 2 characters (with 2 fgetc).


    > in fact, this works perfectly on linux (compiled with gcc), but on
    > windows (with Borland bcc32 compiler), my program doesn't detect \r\n as
    > two separate characters, as you told me.


    > So... how can I detect "\r\n", the EOL in windows ?


    On Windows, when you have opened the file in text mode, the "\r\n"
    sequence will be returned as a single '\n' because in text mode it
    signifies the EOL - and in order to make dealing with text files as
    portable as possible the C functions return a '\n' for whatever
    the the EOL character or charcter sequence is on the system the
    program is running on (as long as the file has been opened in text
    mode). So, the obvious solution is to open the file in binary mode
    (i.e. with "rb" as the second argument to fopen() when you want t
    open the file for reading) whenenver you need to see what's really
    in the file without some handling of special characters (the char-
    acter with the numeric equivalent of 0x1A is another of such char-
    acters that have a special meaning for text files on Windows).

    The "problem" does not seem to exist for you on Linux because there
    the character signifying an EOL is identical to the '\n' the C
    functions are returning, so on Linux (and other Unices) there isn't
    any difference between opening a file in text or binary mode.

    Regards, Jens
    --
    \ Jens Thoms Toerring ___ -berlin.de
    \__________________________ http://www.toerring.de
     
    -berlin.de, Jul 20, 2004
    #18
  19. In article <>, -berlin.de
    wrote:

    > > For traditional MacOS compilers, '\r' tends to be 10,
    > > and '\n' tends to be 13. This illustrates that when dealing
    > > with binary files in a non-native format, it is best to use magic
    > > values. OTOH, when dealign with local text files, '\n' is
    > > best, of course.

    >
    > I don't believe that, they were also using ASCII. AFAIR on "classical"
    > MacOS the end of line marker was simply "\n\r" (i.e. the other way
    > round compared to DOSish systems), but that doesn't make '\r' (i.e. CR)
    > == 0xA and '\n' (LF) == 0xD.


    OT: I am 100% positive that traditional MacOS (up to and including
    MacOS9) use the byte with value 13 to separate text line, with no 10.
    You can check that yhis is the encoding is e.g.
    <ftp://ftp.apple.com/developer/+LICENSE_READ_ME_FIRST>
    This is how the traditional MacOS version of gzip decompresses text files.
    This is the encoding used by e.g. Teachtext and Simpletext, and all
    versions of Microsoft Word when dealing with text files, and..

    Getting back on topic: Apple's own C comilers, part of MPW Shell, indeed
    defines '\n' as 13, and '\r' as 10. This is NOT an option (contrary to
    other compilers). This cause no porting problem with most code.

    [OT: there are headaches when moving files across a network. The
    worse is that for diacriticals such as eacute encoded on a byte, Apple
    has used FOUR different encodings on the Apple2, Lisa, Traditional MacOS,
    and MacOSX; and none of these is the same as in DOS].


    François Grieu
     
    Francois Grieu, Jul 20, 2004
    #19
  20. S!mb@

    Old Wolf Guest

    "Peter Nilsson" <> wrote:
    > > Francois Grieu <> wrote:

    >
    > > > For traditional MacOS compilers, '\r' tends to be 10,
    > > > and '\n' tends to be 13. This illustrates that when dealing
    > > > with binary files in a non-native format, it is best to use magic
    > > > values. OTOH, when dealign with local text files, '\n' is
    > > > best, of course.

    > >
    > > I don't believe that, they were also using ASCII.

    >
    > Believe it, although it wasn't a hard and fast rule that Francois
    > makes it out to be. Many implementations (e.g. Metrowerks) allowed the
    > programmer to optionally swap the values of '\n' and '\r' for text
    > streams. Choosing the '\n' == 0x0D meant that text streams where
    > unencomboured with eol translations.


    It sounds like you are describing conversion of '\n' to '\r' and vice
    versa when a stream is open in text mode, which would be quite normal.
    In fact it's the reason for having text mode and binary mode.

    The OP claimed that '\r' was actually 10, ie. the following:

    printf("%d\n", '\r');

    would print 10. This is a totally different claim (which also
    implies that the system is non-ASCII).
    I'd have to see it to believe it..
     
    Old Wolf, Jul 20, 2004
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Sandeep Grover

    delete on delete !

    Sandeep Grover, Jul 12, 2003, in forum: C++
    Replies:
    19
    Views:
    654
    Chris \( Val \)
    Jul 22, 2003
  2. Replies:
    3
    Views:
    8,494
  3. Spacebar265

    Scanning a file character by character

    Spacebar265, Feb 5, 2009, in forum: Python
    Replies:
    18
    Views:
    484
  4. Carol
    Replies:
    4
    Views:
    198
    Carol
    Jul 30, 2004
  5. Sarika Patil
    Replies:
    2
    Views:
    145
    Heesob Park
    Mar 17, 2009
Loading...

Share This Page