simple file compression program

Discussion in 'C Programming' started by sophia, Mar 26, 2008.

  1. sophia

    sophia Guest

    Dear all,

    the following is the file compression program ,using elimination of
    spaces, which I saw in a book

    #include<stdio.h>
    #include<stdlib.h>

    int main(int argc,char * argv[])
    {

    FILE* fs,*ft;

    fs = fopen(argv[1],"r");
    if(fs == NULL)
    {
    printf("\n Cannot open the file %s",argv[1]);
    exit(1);
    }

    ft = fopen(argv[2],"w");
    if(fs == NULL)
    {
    printf("\n Cannot open the file %s",argv[2]);
    exit(1);
    }

    while( (ch=fgetc(fs)) != EOF)
    {

    if(ch == 32)
    {
    if( (ch=fgetc(fs)) != EOF)
    fputc(ch+127,ft);
    }
    else
    fputc(ch,ft);

    }

    fclose(fs);
    fclose(ft);

    return EXIT_SUCCESS;
    }

    Now my questions are as as follows

    1) Is there any other simpler method to compress text files, similar
    to the above program(Other than standard algorithms like huffman,LZW)
    sophia, Mar 26, 2008
    #1
    1. Advertising

  2. sophia

    Guest

    On Mar 26, 3:09 pm, sophia <> wrote:
    > if(ch == 32)
    > {
    > if( (ch=fgetc(fs)) != EOF)
    > fputc(ch+127,ft);
    > }
    > else
    > fputc(ch,ft);


    What happens when the character represented by the value 32 is the
    last character in the file? You are not writing any representation of
    that character to your output file. You will not be able to recreate
    your source file.

    > Now my questions are as as follows
    >
    > 1) Is there any other simpler method to compress text files, similar
    > to the above program(Other than standard algorithms like huffman,LZW)


    yes. Not really a C issue however. First define what you mean by 'text
    file', then devise a way of mapping the (smaller) domain of your text
    file into the (larger) domain of an unsigned char. And don't forget to
    open your destination file for binary access.
    , Mar 26, 2008
    #2
    1. Advertising

  3. In article <>,
    sophia <> wrote:

    >the following is the file compression program ,using elimination of
    >spaces, which I saw in a book
    >
    >#include<stdio.h>
    >#include<stdlib.h>
    >
    >int main(int argc,char * argv[])
    >{
    >
    > FILE* fs,*ft;
    >
    > fs = fopen(argv[1],"r");
    > if(fs == NULL)
    > {
    > printf("\n Cannot open the file %s",argv[1]);


    You are not outputing a \n as the last character. It is
    implementation defined at to whether the last output line will
    appear in such a case (and it is also possible that it will appear
    but then be immediately overwritten by the next shell prompt, making
    it seem that it did not appear.)

    Error messages are better output to stderr.

    > exit(1);


    exit(1) does not have a defined effect. The arguments
    with defined meaning are 0, EXIT_SUCCESS and EXIT_FAILURE

    > }
    >
    > ft = fopen(argv[2],"w");
    > if(fs == NULL)
    > {
    > printf("\n Cannot open the file %s",argv[2]);
    > exit(1);
    > }
    >
    >while( (ch=fgetc(fs)) != EOF)


    You have not declared ch by this point. The exact definition of ch
    is important to the program. For example, if it were declared as
    'char' and 'char' happened to be unsigned on that system, then
    it would not be possible for ch to compare equal to EOF, which is
    always negative.

    >{
    >
    > if(ch == 32)


    What is 32? If you mean a space, code a space, ' ' . The numerical
    values of particular characters are not specified in C.

    > {
    > if( (ch=fgetc(fs)) != EOF)
    > fputc(ch+127,ft);


    As the character set representation is not specified by C, it
    is possible that ch+127 is a valid character in the character set.

    If the file ends in a 32 then that trailing 32 will be lost with
    your logic.

    I note that you do not open the file in binary mode. It could
    happen that in the input, there were often space characters immediately
    proceeding end-of-line indicators. The end of line indicators would
    be read as '\n' and that '\n' would be transformed by your compressor
    to '\n'+127 which is unlikely to be an end of line indicator. You
    could thus end up with output lines that exceeded the maximum text
    output line size supported by the implementation. You could also
    potentially happen upon characters for which the character + 127
    came out as '\n', thus introducing an end of line where there was none
    before.

    > }
    > else
    > fputc(ch,ft);
    >
    >}
    >
    > fclose(fs);
    > fclose(ft);
    >
    > return EXIT_SUCCESS;
    >}



    >Now my questions are as as follows
    >
    >1) Is there any other simpler method to compress text files, similar
    >to the above program(Other than standard algorithms like huffman,LZW)


    Yes, many of them, most equally inefficient. The code you give at
    best compresses space followed by a character to a different character
    code, and leaves everything else alone -- it doesn't even try to
    compress runs of spaces into something more efficient. If the code
    were to be applied to typical English text, it would produce a
    more efficient output if, instead of compressing spaces, it compressed
    'e', 't', 'a', 'i', 'o', or 'n', all of which occur in English text
    with greater frequency than space does.
    --
    "The whole history of civilization is strewn with creeds and
    institutions which were invaluable at first, and deadly
    afterwards." -- Walter Bagehot
    Walter Roberson, Mar 26, 2008
    #3
  4. "sophia" <> wrote in message
    > 1) Is there any other simpler method to compress text files, similar
    > to the above program(Other than standard algorithms like huffman,LZW)
    >

    squnch compression. It's a sliding dictionarty method that has seen
    induistrial use because of its super-fast decompress. Look in the Basic
    Algorithms pages of my website.

    --
    Free games and programming goodies.
    http://www.personal.leeds.ac.uk/~bgy1mm
    Malcolm McLean, Mar 26, 2008
    #4
  5. sophia

    Bartc Guest

    "sophia" <> wrote in message
    news:...
    > Dear all,
    >
    > the following is the file compression program ,using elimination of
    > spaces, which I saw in a book

    ....
    Now my questions are as as follows
    >
    > 1) Is there any other simpler method to compress text files, similar
    > to the above program(Other than standard algorithms like huffman,LZW)


    Knowing nothing about compression, I had a go myself.

    My first attempt looked promising, but I wasn't processing the entire file
    so it was actually *doubling* the size!

    Had a second attempt, and I think if done properly (tie up all loose ends)
    that could achieve 20-30% (reduction that is). But it is not that simple. In
    fact it's very fiddly (and requires 2 passes of the input). I guess I could
    get it up to 50% if I tried hard.

    What compression levels are you trying to achieve? And how simple do you
    want it?

    In practice I guess it would be a much better idea to use an existing
    compression library, unless you like a challenge.

    --
    Bart
    Bartc, Mar 26, 2008
    #5
  6. On Wed, 26 Mar 2008 13:09:35 -0700 (PDT), sophia
    <> wrote:

    >Dear all,
    >
    >the following is the file compression program ,using elimination of
    >spaces, which I saw in a book


    Was it listed as a bad example? Perhaps the book was intended as a
    satire?

    >
    >#include<stdio.h>
    >#include<stdlib.h>
    >
    >int main(int argc,char * argv[])
    >{
    >
    > FILE* fs,*ft;
    >
    > fs = fopen(argv[1],"r");


    How does the program know argv[1] is not NULL or for that matter that
    it even exists?

    > if(fs == NULL)
    > {
    > printf("\n Cannot open the file %s",argv[1]);
    > exit(1);
    > }
    >
    > ft = fopen(argv[2],"w");
    > if(fs == NULL)
    > {
    > printf("\n Cannot open the file %s",argv[2]);
    > exit(1);
    > }
    >
    >while( (ch=fgetc(fs)) != EOF)


    Where is ch declared?

    >{
    >
    > if(ch == 32)


    32 is not the value of ' ' on my system.

    > {
    > if( (ch=fgetc(fs)) != EOF)
    > fputc(ch+127,ft);


    On my system adding 127 to a printable character value will produce a
    value that won't fit in a char. While this technically isn't overflow
    since fputc takes an int, it will mess up the output file.

    It appears to skip only one space. And it does so without regard to
    whether the space is "significant".

    > }
    > else
    > fputc(ch,ft);
    >
    >}
    >
    > fclose(fs);
    > fclose(ft);
    >
    > return EXIT_SUCCESS;
    >}
    >
    >Now my questions are as as follows
    >
    >1) Is there any other simpler method to compress text files, similar
    >to the above program(Other than standard algorithms like huffman,LZW)



    Remove del for email
    Barry Schwarz, Mar 27, 2008
    #6
  7. sophia

    sophia Guest

    On Mar 27, 10:11 am, Barry Schwarz <> wrote:
    > On Wed, 26 Mar 2008 13:09:35 -0700 (PDT), sophia
    >
    > <> wrote:
    > >Dear all,

    >
    > >the following is the file compression program ,using elimination of
    > >spaces, which I saw in a book

    >
    > Was it listed as a bad example?  Perhaps the book was intended as a
    > satire?


    i don't know if the book was intended as sattire or not .
    The book ISBN number is 81-7656-537-7 and this program is given in
    page no: 55

    > >while( (ch=fgetc(fs)) != EOF)

    >
    > Where is ch declared?
    >
    > >{

    >
    > >  if(ch == 32)

    >
    > 32 is not the value of ' ' on my system.
    >
    > >  {
    > >      if( (ch=fgetc(fs)) != EOF)
    > >      fputc(ch+127,ft);

    >
    > On my system adding 127 to a printable character value will produce a
    > value that won't fit in a char.  While this technically isn't overflow
    > since fputc takes an int, it will mess up the output file.
    >
    > It appears to skip only one space.  And it does so without regard to
    > whether the space is "significant".
    >
    >

    i think to skip more than one space the following changes can be
    made(assuming 32 stands for ' ')


    while( (ch=fgetc(fs)) != EOF)
    {
    if(ch == 32)
    {
    count = 1;
    while( (ch=fgetc(fs)) == 32)
    count++;
    fputc(count+127,ft);
    }
    fputc(ch,ft);
    }

    fputc takes signed int or unsigned int ?
    sophia, Mar 27, 2008
    #7
  8. sophia

    santosh Guest

    sophia wrote:

    > On Mar 27, 10:11 am, Barry Schwarz <> wrote:
    >> On Wed, 26 Mar 2008 13:09:35 -0700 (PDT), sophia
    >>
    >> <> wrote:
    >> >Dear all,

    >>
    >> >the following is the file compression program ,using elimination of
    >> >spaces, which I saw in a book

    >>
    >> Was it listed as a bad example?  Perhaps the book was intended as a
    >> satire?

    >
    > i don't know if the book was intended as sattire or not .
    > The book ISBN number is 81-7656-537-7 and this program is given in
    > page no: 55
    >
    >> >while( (ch=fgetc(fs)) != EOF)

    >>
    >> Where is ch declared?
    >>
    >> >{

    >>
    >> > if(ch == 32)

    >>
    >> 32 is not the value of ' ' on my system.
    >>
    >> > {
    >> > if( (ch=fgetc(fs)) != EOF)
    >> > fputc(ch+127,ft);

    >>
    >> On my system adding 127 to a printable character value will produce a
    >> value that won't fit in a char.  While this technically isn't
    >> overflow since fputc takes an int, it will mess up the output file.
    >>
    >> It appears to skip only one space.  And it does so without regard to
    >> whether the space is "significant".
    >>
    >>

    > i think to skip more than one space the following changes can be
    > made(assuming 32 stands for ' ')
    >
    >
    > while( (ch=fgetc(fs)) != EOF)
    > {
    > if(ch == 32)


    Why not make this ASCII independent by replacing 32 with ' '?

    > {
    > count = 1;
    > while( (ch=fgetc(fs)) == 32)
    > count++;
    > fputc(count+127,ft);


    And this is also implementation defined behaviour.

    > }
    > fputc(ch,ft);
    > }
    >
    > fputc takes signed int or unsigned int ?


    It takes a signed int argument, but converts that to an unsigned char
    before writing to the stream. If the write fails it returns EOF,
    otherwise the character it wrote converted to int.
    santosh, Mar 27, 2008
    #8
  9. On Thu, 27 Mar 2008 02:45:44 -0700 (PDT), sophia
    <> wrote:

    >On Mar 27, 10:11 am, Barry Schwarz <> wrote:
    >> On Wed, 26 Mar 2008 13:09:35 -0700 (PDT), sophia
    >>
    >> <> wrote:
    >> >Dear all,

    >>
    >> >the following is the file compression program ,using elimination of
    >> >spaces, which I saw in a book

    >>
    >> Was it listed as a bad example?  Perhaps the book was intended as a
    >> satire?

    >
    >i don't know if the book was intended as sattire or not .
    >The book ISBN number is 81-7656-537-7 and this program is given in
    >page no: 55
    >
    >> >while( (ch=fgetc(fs)) != EOF)

    >>
    >> Where is ch declared?
    >>
    >> >{

    >>
    >> >  if(ch == 32)

    >>
    >> 32 is not the value of ' ' on my system.
    >>
    >> >  {
    >> >      if( (ch=fgetc(fs)) != EOF)
    >> >      fputc(ch+127,ft);

    >>
    >> On my system adding 127 to a printable character value will produce a
    >> value that won't fit in a char.  While this technically isn't overflow
    >> since fputc takes an int, it will mess up the output file.
    >>
    >> It appears to skip only one space.  And it does so without regard to
    >> whether the space is "significant".
    >>
    >>

    > i think to skip more than one space the following changes can be
    >made(assuming 32 stands for ' ')


    Why assume something known to be false when the expression ' ' will
    work every time.

    >
    >
    >while( (ch=fgetc(fs)) != EOF)
    >{
    > if(ch == 32)
    > {
    > count = 1;
    > while( (ch=fgetc(fs)) == 32)


    What happens if the last three characters in the stream are blank?

    > count++;
    > fputc(count+127,ft);
    > }
    > fputc(ch,ft);
    >}
    >
    > fputc takes signed int or unsigned int ?



    Remove del for email
    Barry Schwarz, Mar 28, 2008
    #9
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. b
    Replies:
    1
    Views:
    334
    Kevin Goodsell
    Sep 22, 2003
  2. b

    Program for compression

    b, Sep 22, 2003, in forum: C Programming
    Replies:
    1
    Views:
    318
    Kevin Goodsell
    Sep 22, 2003
  3. Elaine Jackson

    file compression

    Elaine Jackson, Jun 29, 2004, in forum: Python
    Replies:
    8
    Views:
    757
    Peter Maas
    Jun 29, 2004
  4. =?Utf-8?B?d2VzdHdhcmQ=?=

    video file compression

    =?Utf-8?B?d2VzdHdhcmQ=?=, Sep 4, 2007, in forum: ASP .Net
    Replies:
    0
    Views:
    426
    =?Utf-8?B?d2VzdHdhcmQ=?=
    Sep 4, 2007
  5. A. Bonslater
    Replies:
    0
    Views:
    352
    A. Bonslater
    Sep 5, 2008
Loading...

Share This Page