simple file compression program

S

sophia

Dear all,

the following is the file compression program ,using elimination of
spaces, which I saw in a book

#include<stdio.h>
#include<stdlib.h>

int main(int argc,char * argv[])
{

FILE* fs,*ft;

fs = fopen(argv[1],"r");
if(fs == NULL)
{
printf("\n Cannot open the file %s",argv[1]);
exit(1);
}

ft = fopen(argv[2],"w");
if(fs == NULL)
{
printf("\n Cannot open the file %s",argv[2]);
exit(1);
}

while( (ch=fgetc(fs)) != EOF)
{

if(ch == 32)
{
if( (ch=fgetc(fs)) != EOF)
fputc(ch+127,ft);
}
else
fputc(ch,ft);

}

fclose(fs);
fclose(ft);

return EXIT_SUCCESS;
}

Now my questions are as as follows

1) Is there any other simpler method to compress text files, similar
to the above program(Other than standard algorithms like huffman,LZW)
 
M

mstorkamp

if(ch == 32)
{
if( (ch=fgetc(fs)) != EOF)
fputc(ch+127,ft);
}
else
fputc(ch,ft);

What happens when the character represented by the value 32 is the
last character in the file? You are not writing any representation of
that character to your output file. You will not be able to recreate
your source file.
Now my questions are as as follows

1) Is there any other simpler method to compress text files, similar
to the above program(Other than standard algorithms like huffman,LZW)

yes. Not really a C issue however. First define what you mean by 'text
file', then devise a way of mapping the (smaller) domain of your text
file into the (larger) domain of an unsigned char. And don't forget to
open your destination file for binary access.
 
W

Walter Roberson

sophia said:
the following is the file compression program ,using elimination of
spaces, which I saw in a book

#include<stdio.h>
#include<stdlib.h>

int main(int argc,char * argv[])
{

FILE* fs,*ft;

fs = fopen(argv[1],"r");
if(fs == NULL)
{
printf("\n Cannot open the file %s",argv[1]);

You are not outputing a \n as the last character. It is
implementation defined at to whether the last output line will
appear in such a case (and it is also possible that it will appear
but then be immediately overwritten by the next shell prompt, making
it seem that it did not appear.)

Error messages are better output to stderr.

exit(1) does not have a defined effect. The arguments
with defined meaning are 0, EXIT_SUCCESS and EXIT_FAILURE
}

ft = fopen(argv[2],"w");
if(fs == NULL)
{
printf("\n Cannot open the file %s",argv[2]);
exit(1);
}

while( (ch=fgetc(fs)) != EOF)

You have not declared ch by this point. The exact definition of ch
is important to the program. For example, if it were declared as
'char' and 'char' happened to be unsigned on that system, then
it would not be possible for ch to compare equal to EOF, which is
always negative.
{

if(ch == 32)

What is 32? If you mean a space, code a space, ' ' . The numerical
values of particular characters are not specified in C.
{
if( (ch=fgetc(fs)) != EOF)
fputc(ch+127,ft);

As the character set representation is not specified by C, it
is possible that ch+127 is a valid character in the character set.

If the file ends in a 32 then that trailing 32 will be lost with
your logic.

I note that you do not open the file in binary mode. It could
happen that in the input, there were often space characters immediately
proceeding end-of-line indicators. The end of line indicators would
be read as '\n' and that '\n' would be transformed by your compressor
to '\n'+127 which is unlikely to be an end of line indicator. You
could thus end up with output lines that exceeded the maximum text
output line size supported by the implementation. You could also
potentially happen upon characters for which the character + 127
came out as '\n', thus introducing an end of line where there was none
before.
}
else
fputc(ch,ft);

}

fclose(fs);
fclose(ft);

return EXIT_SUCCESS;
}

Now my questions are as as follows

1) Is there any other simpler method to compress text files, similar
to the above program(Other than standard algorithms like huffman,LZW)

Yes, many of them, most equally inefficient. The code you give at
best compresses space followed by a character to a different character
code, and leaves everything else alone -- it doesn't even try to
compress runs of spaces into something more efficient. If the code
were to be applied to typical English text, it would produce a
more efficient output if, instead of compressing spaces, it compressed
'e', 't', 'a', 'i', 'o', or 'n', all of which occur in English text
with greater frequency than space does.
 
M

Malcolm McLean

sophia said:
1) Is there any other simpler method to compress text files, similar
to the above program(Other than standard algorithms like huffman,LZW)
squnch compression. It's a sliding dictionarty method that has seen
induistrial use because of its super-fast decompress. Look in the Basic
Algorithms pages of my website.
 
B

Bartc

sophia said:
Dear all,

the following is the file compression program ,using elimination of
spaces, which I saw in a book
....
Now my questions are as as follows
1) Is there any other simpler method to compress text files, similar
to the above program(Other than standard algorithms like huffman,LZW)

Knowing nothing about compression, I had a go myself.

My first attempt looked promising, but I wasn't processing the entire file
so it was actually *doubling* the size!

Had a second attempt, and I think if done properly (tie up all loose ends)
that could achieve 20-30% (reduction that is). But it is not that simple. In
fact it's very fiddly (and requires 2 passes of the input). I guess I could
get it up to 50% if I tried hard.

What compression levels are you trying to achieve? And how simple do you
want it?

In practice I guess it would be a much better idea to use an existing
compression library, unless you like a challenge.
 
B

Barry Schwarz

Dear all,

the following is the file compression program ,using elimination of
spaces, which I saw in a book

Was it listed as a bad example? Perhaps the book was intended as a
satire?
#include<stdio.h>
#include<stdlib.h>

int main(int argc,char * argv[])
{

FILE* fs,*ft;

fs = fopen(argv[1],"r");

How does the program know argv[1] is not NULL or for that matter that
it even exists?
if(fs == NULL)
{
printf("\n Cannot open the file %s",argv[1]);
exit(1);
}

ft = fopen(argv[2],"w");
if(fs == NULL)
{
printf("\n Cannot open the file %s",argv[2]);
exit(1);
}

while( (ch=fgetc(fs)) != EOF)

Where is ch declared?
{

if(ch == 32)

32 is not the value of ' ' on my system.
{
if( (ch=fgetc(fs)) != EOF)
fputc(ch+127,ft);

On my system adding 127 to a printable character value will produce a
value that won't fit in a char. While this technically isn't overflow
since fputc takes an int, it will mess up the output file.

It appears to skip only one space. And it does so without regard to
whether the space is "significant".
}
else
fputc(ch,ft);

}

fclose(fs);
fclose(ft);

return EXIT_SUCCESS;
}

Now my questions are as as follows

1) Is there any other simpler method to compress text files, similar
to the above program(Other than standard algorithms like huffman,LZW)


Remove del for email
 
S

sophia

Was it listed as a bad example?  Perhaps the book was intended as a
satire?

i don't know if the book was intended as sattire or not .
The book ISBN number is 81-7656-537-7 and this program is given in
page no: 55
Where is ch declared?



32 is not the value of ' ' on my system.


On my system adding 127 to a printable character value will produce a
value that won't fit in a char.  While this technically isn't overflow
since fputc takes an int, it will mess up the output file.

It appears to skip only one space.  And it does so without regard to
whether the space is "significant".
i think to skip more than one space the following changes can be
made(assuming 32 stands for ' ')


while( (ch=fgetc(fs)) != EOF)
{
if(ch == 32)
{
count = 1;
while( (ch=fgetc(fs)) == 32)
count++;
fputc(count+127,ft);
}
fputc(ch,ft);
}

fputc takes signed int or unsigned int ?
 
S

santosh

sophia said:
i don't know if the book was intended as sattire or not .
The book ISBN number is 81-7656-537-7 and this program is given in
page no: 55

i think to skip more than one space the following changes can be
made(assuming 32 stands for ' ')


while( (ch=fgetc(fs)) != EOF)
{
if(ch == 32)

Why not make this ASCII independent by replacing 32 with ' '?
{
count = 1;
while( (ch=fgetc(fs)) == 32)
count++;
fputc(count+127,ft);

And this is also implementation defined behaviour.
}
fputc(ch,ft);
}

fputc takes signed int or unsigned int ?

It takes a signed int argument, but converts that to an unsigned char
before writing to the stream. If the write fails it returns EOF,
otherwise the character it wrote converted to int.
 
B

Barry Schwarz

i don't know if the book was intended as sattire or not .
The book ISBN number is 81-7656-537-7 and this program is given in
page no: 55

i think to skip more than one space the following changes can be
made(assuming 32 stands for ' ')

Why assume something known to be false when the expression ' ' will
work every time.
while( (ch=fgetc(fs)) != EOF)
{
if(ch == 32)
{
count = 1;
while( (ch=fgetc(fs)) == 32)

What happens if the last three characters in the stream are blank?
count++;
fputc(count+127,ft);
}
fputc(ch,ft);
}

fputc takes signed int or unsigned int ?


Remove del for email
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,731
Messages
2,569,432
Members
44,832
Latest member
GlennSmall

Latest Threads

Top