Deleting first N lines from a text file

Discussion in 'C Programming' started by pozz, Nov 15, 2011.

  1. pozz

    pozz Guest

    I want to delete the first N lines from a file text. I imagine two
    approaches:
    - use a temporary file to copy the last lines only
    - use the same file to move characters starting from N+1 line to the
    beginning

    The temporary file could be more complex to write (at last I have to
    delete the original file and rename the temporary file), but at any
    moment I have a coherent text file. So this approach is safe if the
    application crashes during the deleting process. If the application
    crashes just after deleting the original text file but before renaming
    the temporary file, during initialization I can detect this situation
    and proceed with the renaming.

    The second approach is simpler, but leaves a malformed text file on
    the filesystem if the application crashes during the deleting process.

    What do you think about those thoughts? Do you agree with me?

    My "deleting first N lines" function is:

    int text_delete(unsigned int N) {
    FILE *f;
    FILE *ftmp;
    int c;
    f = fopen(filename, "rt");
    ftmp = fopen(tmpfilename, "wt");
    if ((f == NULL) || (ftmp == NULL)) {
    return -1;
    }
    while((c = fgetc(f)) != EOF) {
    if ((char)c == '\n') {
    if (--N == 0) break;
    }
    }
    while((c = fgetc(f)) != EOF) {
    fputc(c, ftmp);
    }
    fclose(f);
    fclose(ftmp);
    if (remove(filename) < 0) return -1;
    if (rename(tmpfilename, filename) < 0) return -1;
    return 0;
    }

    At initialization I try to open the text file or the temporary file;

    int text_init(void) {
    FILE *f;
    f = fopen(filename, "rt");
    if (f == NULL) {
    /* Does the temporary file exist? */
    f = fopen(tmpfilename, "rt");
    if (f != NULL) {
    /* Yes!, recover temporary file */
    fclose(f);
    if (rename(tmpfilename, filename) < 0) return -1;
    } else {
    /* Create an empty log file... */
    f = fopen(filename, "wt");
    if (f == NULL) return -1;
    fclose(f);
    }
    } else {
    fclose(f);
    }
    return 0;
    }
    pozz, Nov 15, 2011
    #1
    1. Advertising

  2. pozz wrote:
    >I want to delete the first N lines from a file text.
    >...
    >The second approach is simpler,...
    >...
    >What do you think about those thoughts?


    Only that the second approach is not simpler.
    Also, depending on the underlying OS, it may not be possible to read
    from and write to the same file as you propose.

    --
    Roberto Waltman

    [ Please reply to the group,
    return address is invalid ]
    Roberto Waltman, Nov 15, 2011
    #2
    1. Advertising

  3. pozz

    Ben Pfaff Guest

    Acid Washed China Blue Jeans <> writes:

    > In article <>,
    > Roberto Waltman <> wrote:
    >
    >> pozz wrote:
    >> >I want to delete the first N lines from a file text.
    >> >...
    >> >The second approach is simpler,...
    >> >...
    >> >What do you think about those thoughts?

    >>
    >> Only that the second approach is not simpler.
    >> Also, depending on the underlying OS, it may not be possible to read
    >> from and write to the same file as you propose.

    >
    > Fopen with "r+". If fopen succeeds, the library has promised
    > you you are allowed to read and write an existing file.


    However, writing in a text file may truncate it, see 7.19.3
    "Files":

    Whether a write on a text stream causes the associated file
    to be truncated beyond that point is implementation-defined.
    --
    int main(void){char p[]="ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz.\
    \n",*q="kl BIcNBFr.NKEzjwCIxNJC";int i=sizeof p/2;char *strchr();int putchar(\
    );while(*q){i+=strchr(p,*q++)-p;if(i>=(int)sizeof p)i-=sizeof p-1;putchar(p\
    );}return 0;}
    Ben Pfaff, Nov 15, 2011
    #3
  4. Acid Washed China Blue Jeans wrote:

    >Fopen with "r+". If fopen succeeds, the library has promised you you are allowed
    >to read and write an existing file.


    In the general case, a write may truncate the file at the end of the
    written data, so it may be OK to read from a location before the last
    location written, but not after it.

    And there may be environments in which fopen(..., "r+") always fails.

    --
    Roberto Waltman

    [ Please reply to the group,
    return address is invalid ]
    Roberto Waltman, Nov 15, 2011
    #4
  5. pozz

    Eric Sosman Guest

    On 11/14/2011 7:02 PM, pozz wrote:
    > I want to delete the first N lines from a file text. I imagine two
    > approaches:
    > - use a temporary file to copy the last lines only


    Do this.

    > - use the same file to move characters starting from N+1 line to the
    > beginning


    Don't do this.

    > The temporary file could be more complex to write (at last I have to
    > delete the original file and rename the temporary file), but at any
    > moment I have a coherent text file. So this approach is safe if the
    > application crashes during the deleting process. If the application
    > crashes just after deleting the original text file but before renaming
    > the temporary file, during initialization I can detect this situation
    > and proceed with the renaming.
    >
    > The second approach is simpler, but leaves a malformed text file on
    > the filesystem if the application crashes during the deleting process.
    >
    > What do you think about those thoughts? Do you agree with me?


    No, not at all. One problem with your supposedly simpler
    solution: How do you tell subsequent readers of the file that they
    should stop before reaching the end? Observe that <stdio.h> offers
    no way to shorten an existing file to any length other than zero.

    --
    Eric Sosman
    d
    Eric Sosman, Nov 15, 2011
    #5
  6. pozz

    jacob navia Guest

    Using the containers library (and if your file fits in memory)

    #include <containers.h>
    int main(int argc,char *argv[])
    {
    if (argc != 3) {
    printf("Usage: deletelines <file> <N>\n");
    return -1;
    }
    strCollection *data = istrCollection.CreateFromFile(argv[1]);
    if (data == NULL) return -1;
    istrCollection.RemoveRange(data,0,atoi(argv[2]));
    istrCollection.WriteToFile(data,argv[1]);
    istrCollection.Finalize(data);
    }
    jacob navia, Nov 15, 2011
    #6
  7. pozz

    Giuseppe Guest

    On 15 Nov, 04:06, Eric Sosman <> wrote:
    > > What do you think about those thoughts? Do you agree with me?

    >
    >      No, not at all.  One problem with your supposedly simpler
    > solution: How do you tell subsequent readers of the file that they
    > should stop before reaching the end?  Observe that <stdio.h> offers
    > no way to shorten an existing file to any length other than zero.


    Ok, I implemented the "temporary file" solution and it works well.
    The
    only disadvantage is time: when the file is big (1000 lines of about
    50 bytes
    each), the time to delete the first line could be very high.

    Do you think the process could be reduced launching an external script
    (for
    example, 'head' based) with system()? If I redirect the output to the
    original
    filename I could avoid the time consuming process of copying the
    original
    to the temporary file.
    Giuseppe, Nov 16, 2011
    #7
  8. Giuseppe <> writes:
    > On 15 Nov, 04:06, Eric Sosman <> wrote:
    >> > What do you think about those thoughts? Do you agree with me?

    >>
    >>      No, not at all.  One problem with your supposedly simpler
    >> solution: How do you tell subsequent readers of the file that they
    >> should stop before reaching the end?  Observe that <stdio.h> offers
    >> no way to shorten an existing file to any length other than zero.

    >
    > Ok, I implemented the "temporary file" solution and it works well.
    > The only disadvantage is time: when the file is big (1000 lines of
    > about 50 bytes each), the time to delete the first line could be very
    > high.


    A text file of 1000 lines of 50 bytes each really isn't all that big.
    The time to copy and rename it probably won't even be noticeable.

    > Do you think the process could be reduced launching an external script
    > (for example, 'head' based) with system()? If I redirect the output
    > to the original filename I could avoid the time consuming process of
    > copying the original to the temporary file.


    The behavior of external program is outside the scope of the C language.

    (But I'll mention that on Unix-like systems, running a command with its
    input and output directed to the same file can cause serious problems;
    it can easily end up reading a partially modified version of the file
    instead of the original. And even if it works, it's likely going to be
    doing the same thing you would have done in your program.)

    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    "We must do something. This is something. Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
    Keith Thompson, Nov 16, 2011
    #8
  9. pozz

    Eric Sosman Guest

    On 11/15/2011 7:57 PM, Giuseppe wrote:
    > On 15 Nov, 04:06, Eric Sosman<> wrote:
    >>> What do you think about those thoughts? Do you agree with me?

    >>
    >> No, not at all. One problem with your supposedly simpler
    >> solution: How do you tell subsequent readers of the file that they
    >> should stop before reaching the end? Observe that<stdio.h> offers
    >> no way to shorten an existing file to any length other than zero.

    >
    > Ok, I implemented the "temporary file" solution and it works well.
    > The
    > only disadvantage is time: when the file is big (1000 lines of about
    > 50 bytes
    > each), the time to delete the first line could be very high.


    Fifty K shouldn't take long. Even on a system from forty years
    ago it didn't take long. Even on paper tape, for goodness' sake, it
    took less than a minute!

    For "really big" files (terabytes) copying most of the file from
    one place to another could take an unacceptably long time. Also, the
    need to find space for a second nearly complete copy could be
    troublesome. In such cases you'd be justified in seeking fancier
    solutions -- but I sincerely doubt that "slide all those terabytes
    a couple hundred positions leftward" would produce a savings. More
    likely it would produce a slowdown, plus the risks you've already
    mentioned about data loss in the event of an error. No, the fancier
    solution would probably involve some kind of an index external to the
    file, describing which parts of the file were "live" and which "dead,"
    and fancier routines to read just the live parts.

    > Do you think the process could be reduced launching an external script
    > (for
    > example, 'head' based) with system()? If I redirect the output to the
    > original
    > filename I could avoid the time consuming process of copying the
    > original
    > to the temporary file.


    First, just what do you imagine the "head" program does, hmmm?

    However, on the systems I've encountered that provide a "head"
    utility and support "redirection," your solution is likely to run
    very quickly indeed. And save a lot of disk space, too! (Hint:
    Try it yourself: `head <foo.txt >foo.txt', then `ls -l foo.txt',
    and then you get to test your backups ...)

    But all this is mostly beside the point. You are worried about
    the time to copy 50K bytes: Have you *measured* the time? Have you
    actually found it to be a problem for your application? Or are you
    just imagining monsters under your bed? The fundamental theorem of
    all optimization is There Are No Monsters Until You've Measured Them.

    --
    Eric Sosman
    d
    Eric Sosman, Nov 16, 2011
    #9
  10. pozz

    pozz Guest

    On 16 Nov, 02:50, Keith Thompson <> wrote:
    > Giuseppe <> writes:
    > > Ok, I implemented the "temporary file" solution and it works well.
    > > The only disadvantage is time: when the file is big (1000 lines of
    > > about 50 bytes each), the time to delete the first line could be very
    > > high.

    >
    > A text file of 1000 lines of 50 bytes each really isn't all that big.
    > The time to copy and rename it probably won't even be noticeable.


    It takes about 100ms to finish the shrink procedure. It's not a long
    time
    on a desktop PC, but I'm working on ambedded Linux based on ARM9
    processor.

    The slowest part of my application is this. Anyway I'm thinking if
    there
    are some simple improvements to reduce the time taken by this task.


    > > Do you think the process could be reduced launching an external script
    > > (for example, 'head' based) with system()?  If I redirect the output
    > > to the original filename I could avoid the time consuming process of
    > > copying the original to the temporary file.

    >
    > The behavior of external program is outside the scope of the C language.


    Oh, I now, I was asking for on "off-topic" opinion :)


    > (But I'll mention that on Unix-like systems, running a command with its
    > input and output directed to the same file can cause serious problems;
    > it can easily end up reading a partially modified version of the file
    > instead of the original.  And even if it works, it's likely going to be
    > doing the same thing you would have done in your program.)


    Ok, I'll not try.
    pozz, Nov 16, 2011
    #10
  11. pozz

    pozz Guest

    On 16 Nov, 03:48, Eric Sosman <> wrote:
    > On 11/15/2011 7:57 PM, Giuseppe wrote:
    > > Ok, I implemented the "temporary file" solution and it works well.
    > > The
    > > only disadvantage is time: when the file is big (1000 lines of about
    > > 50 bytes
    > > each), the time to delete the first line could be very high.

    >
    >      Fifty K shouldn't take long.  Even on a system from forty years
    > ago it didn't take long.  Even on paper tape, for goodness' sake, it
    > took less than a minute!


    100ms (see my answer to Keith above). It's not too much, but I was
    thingking
    about improvements.


    >      For "really big" files (terabytes) copying most of the file from
    > one place to another could take an unacceptably long time.  Also, the
    > need to find space for a second nearly complete copy could be
    > troublesome.  In such cases you'd be justified in seeking fancier
    > solutions -- but I sincerely doubt that "slide all those terabytes
    > a couple hundred positions leftward" would produce a savings.  More
    > likely it would produce a slowdown, plus the risks you've already
    > mentioned about data loss in the event of an error.  No, the fancier
    > solution would probably involve some kind of an index external to the
    > file, describing which parts of the file were "live" and which "dead,"
    > and fancier routines to read just the live parts.


    Ok.


    > > Do you think the process could be reduced launching an external script
    > > (for
    > > example, 'head' based) with system()?  If I redirect the output to the
    > > original
    > > filename I could avoid the time consuming process of copying the
    > > original
    > > to the temporary file.

    >
    >      First, just what do you imagine the "head" program does, hmmm?
    >
    >      However, on the systems I've encountered that provide a "head"
    > utility and support "redirection," your solution is likely to run
    > very quickly indeed.  And save a lot of disk space, too!  (Hint:
    > Try it yourself: `head <foo.txt >foo.txt', then `ls -l foo.txt',
    > and then you get to test your backups ...)


    :)


    >      But all this is mostly beside the point.  You are worried about
    > the time to copy 50K bytes: Have you *measured* the time?  Have you
    > actually found it to be a problem for your application?  Or are you
    > just imagining monsters under your bed?  The fundamental theorem of
    > all optimization is There Are No Monsters Until You've Measured Them.
    pozz, Nov 16, 2011
    #11
  12. pozz

    Phil Carmody Guest

    Acid Washed China Blue Jeans <> writes:
    > In article <>,
    > Roberto Waltman <> wrote:
    > > pozz wrote:
    > > >I want to delete the first N lines from a file text.
    > > >...
    > > >The second approach is simpler,...
    > > >...
    > > >What do you think about those thoughts?

    > >
    > > Only that the second approach is not simpler.
    > > Also, depending on the underlying OS, it may not be possible to read
    > > from and write to the same file as you propose.

    >
    > Fopen with "r+". If fopen succeeds, the library has promised you you are allowed
    > to read and write an existing file.


    Being allowed to write to it at the point that you open the file
    doesn't mean that it's possible to write to the file at any point
    later in time.

    Think wire-cutters.

    Phil
    --
    Unix is simple. It just takes a genius to understand its simplicity
    -- Dennis Ritchie (1941-2011), Unix Co-Creator
    Phil Carmody, Nov 16, 2011
    #12
  13. pozz

    jgharston Guest

    pozz wrote:
    > It takes about 100ms to finish the shrink procedure.  It's not a long
    > time on a desktop PC, but I'm working on ambedded Linux based on ARM9
    > processor.


    Are you doing it byte by byte? Try buffering it, even chunks of
    16 bytes at a time will speed it up significantly. What's the
    biggest chunk of memory you can claim, use, release without
    memory fragmentation impacting your program more than acceptably?

    JGH
    jgharston, Nov 16, 2011
    #13
  14. pozz

    -.- Guest

    jacob navia was trying to save the world with his stuff:

    > Using the containers library (and if your file fits in memory)
    >
    > #include <containers.h>


    You self-celebrating fucko. There only exist your things to you:
    that silly lcc-win and your funny containers.
    Stop making this newsgroup your personal advertisements page.
    -.-, Nov 16, 2011
    #14
  15. pozz

    jacob navia Guest

    Le 16/11/11 14:01, -.- a écrit :
    > jacob navia was trying to save the world with his stuff:
    >
    >> Using the containers library (and if your file fits in memory)
    >>
    >> #include <containers.h>

    >
    > You self-celebrating fucko.


    That is why you hide behind a pseudo, because you have the courage of
    your opinions...
    jacob navia, Nov 16, 2011
    #15
  16. pozz

    BartC Guest

    "pozz" <> wrote in message
    news:...
    > On 16 Nov, 03:48, Eric Sosman <> wrote:
    >> On 11/15/2011 7:57 PM, Giuseppe wrote:
    >> > Ok, I implemented the "temporary file" solution and it works well.
    >> > The
    >> > only disadvantage is time: when the file is big (1000 lines of about
    >> > 50 bytes
    >> > each), the time to delete the first line could be very high.

    >>
    >> Fifty K shouldn't take long. Even on a system from forty years
    >> ago it didn't take long. Even on paper tape, for goodness' sake, it
    >> took less than a minute!


    (That's a fast paper tape reader. The last one I used would have taken
    nearly 3 hours.)

    > 100ms (see my answer to Keith above). It's not too much, but I was
    > thingking
    > about improvements.


    How long for a file containing ten lines instead of 1000? How long for
    double the number of lines?

    That will tell you the overheads involved and the fastest speed achievable.

    While you're about, how long does it take to create a file, write 50,000
    bytes to it (of anything) and close it? And how long to read such a file?

    Take care when taking measurements, to eliminate the effects of
    disk-caching.

    --
    Bartc
    BartC, Nov 16, 2011
    #16
  17. pozz

    jgharston Guest

    Try replacing:
    >         while((c = fgetc(f)) != EOF) {
    >                 fputc(c, ftmp);
    >         }


    with:
    bsize=m_free(0);
    buff=m_alloc(bsize);
    numread=-1;

    while(numread) {
    numread=fread(buff,1,bsize,f);
    fwrite(buff,1,numread,ftmp);
    }
    m_free(buff);

    As with usenet tradition, completely untested.

    JGH
    jgharston, Nov 16, 2011
    #17
  18. pozz

    jgharston Guest

    jgharston wrote:
    >         bsize=m_free(0);
    >         buff=m_alloc(bsize);


    Following up my own post, that call to m_free(0) is supposed to
    return a size of a free block that can subsequently be claimed
    with m_alloc(). A bit of a skim of through the web shows that
    functionality isn't in any of the malloc libraries documented
    there. All I can say is it worked 25 years ago! and inspired
    me to include that functionality in my own malloc library.

    Just replace bsize=m_free(0) with a suitable bsize=(some
    method of deciding an amount of memory to claim).

    JGH
    jgharston, Nov 16, 2011
    #18
  19. jgharston <> writes:
    > Try replacing:
    >>         while((c = fgetc(f)) != EOF) {
    >>                 fputc(c, ftmp);
    >>         }

    >
    > with:
    > bsize=m_free(0);
    > buff=m_alloc(bsize);
    > numread=-1;
    >
    > while(numread) {
    > numread=fread(buff,1,bsize,f);
    > fwrite(buff,1,numread,ftmp);
    > }
    > m_free(buff);
    >
    > As with usenet tradition, completely untested.


    Leaving aside the m_free and m_alloc calls, why do you assume that this
    will be significantly faster than the fgetc/fputc loop? stdio does its
    own buffering.

    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    "We must do something. This is something. Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
    Keith Thompson, Nov 16, 2011
    #19
  20. pozz

    jgharston Guest

    Keith Thompson wrote:
    > Leaving aside the m_free and m_alloc calls, why do you assume that this
    > will be significantly faster than the fgetc/fputc loop?  stdio does its
    > own buffering.


    As I recall, this was a standard exam question back when I worra
    litt'un.
    If doing bulk data copying a program buffer is likely to be bigger
    than stdio's buffer and bulk read/write/read/write is more efficient
    for simple chucking of large lumps of data from one place to another,
    one bit being the skipping of fgetc's unget functionality.

    JGH
    jgharston, Nov 16, 2011
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Murali
    Replies:
    2
    Views:
    544
    Jerry Coffin
    Mar 9, 2006
  2. Harry Barker
    Replies:
    2
    Views:
    507
    Alf P. Steinbach
    Apr 19, 2006
  3. Joey Martin

    Deleting blank lines from text file

    Joey Martin, Aug 30, 2005, in forum: ASP General
    Replies:
    1
    Views:
    190
    Evertjan.
    Aug 30, 2005
  4. Jo Ay
    Replies:
    6
    Views:
    130
    Daniel Finnie
    Apr 15, 2008
  5. crea
    Replies:
    2
    Views:
    398
    Nobody
    Dec 28, 2012
Loading...

Share This Page