compressing charatcers

jay · Apr 2, 2014

PROBLEM: You are given a string FOOFIGHTERS. You have to come up with an algorithm that will compress this string. You also have to make sure that youare not using extra memory. For example: FOOFIGHTERS will be compressed asFO2FIGHTERS. You should not use another array or bitfield to keep a frequency count for the individual letters.

WHAT I DID: coded it in C. It is working fine.

WHAT I WANT from c.l.c: This is written purely to learn C. This is not home work and I am not getting any interview calls since an year

. I wroteit because I wanted to know C better

Is the C code written here is of good quality as per c.l.c standard ?
Can more improvements be done ?
Did I miss any C library which could solve my task in a better way ?
do you see any hidden bugs there ?

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

enum { MAXNUMSZ = 100 };

void get_string(char** pp, const char* t);
void add_num(char* s, const size_t begin, const size_t cnt, size_t* end);
void convert_num_to_str(char* p, const size_t num);
void add_count_to_str(char* s, const char* snum, size_t* begin, size_t* end);
void move_elements(char* s, size_t begin, size_t end);
void compress_string(char* str, const size_t sz);

int main(int argc, char* argv[])
{
size_t sz = 0;
char* p = NULL;
if(2 != argc)
{
printf("Please provide one argument\n");
exit(EXIT_FAILURE);
}

sz = strlen(argv[1]) + 1;
get_string(&p, argv[1]);
printf("You entered: [%s]\n", p);
compress_string(p, sz);
printf("Result = [%s]\n", p);
free(p);

return 0;
}

void get_string(char** pp, const char* t)
{
size_t len;
char* mp;

len = strlen(t) + 1;
mp = malloc(len * (sizeof *mp));
if(NULL == mp)
{
printf("IN: %s @ %d: Out of Memory\n", __FILE__, __LINE__);
*pp = NULL;
exit(EXIT_FAILURE);
}
*pp = mp;
strcpy(*pp, t);
}

void compress_string(char* str, const size_t sz)
{
size_t bidx = 0;
size_t idx = 0;
size_t cnt = 0;
char prev = '\0';
char c = '\0';

for(idx = 0; idx < sz; ++idx)
{
c = str[idx];
/* printf("c = [%c], prev = [%c]\n", c, prev); */
if(c == prev)
{
++cnt;
}
else if(cnt > 1)
{
add_num(str, bidx, cnt, &idx);
printf("Array = %s\n", str);
printf("New Index = %u\n\n", idx);
--idx; /* for loop will increment it anyhow */
cnt = 0;
}
else
{
prev = c;
cnt = 1;
bidx = idx;
}
}
}

void add_num(char* s, const size_t begin, const size_t cnt, size_t* end)
{
char strnum[MAXNUMSZ+1] = {0};
size_t bidx = begin;
size_t eidx = *end;
printf("Repitition begins at [%u] = %c, ends = %u\n", begin, s[begin],*end);
convert_num_to_str(strnum, cnt);
printf("String to add = %s\n", strnum);
add_count_to_str(s, strnum, &bidx, &eidx);
move_elements(s, bidx, *end);
*end = eidx;
}

void move_elements(char* s, size_t begin, size_t end)
{
for(; s[end]; ++end)
{
s[begin++] = s[end];
}
s[begin] = '\0';
}

void add_count_to_str(char* s, const char* snum, size_t* begin, size_t* end)
{
(*begin)++;
for(;*snum; ++snum)
{
s[(*begin)++] = *snum;
}
/* reset the index */
*end = *begin;
}

void convert_num_to_str(char* p, const size_t num)
{
int ret = 0;
ret = sprintf(p, "%u", num);
if(0 > ret)
{
printf("IN: %s @%d: SPRINTF() ERROR\n", __FILE__, __LINE__);
exit(EXIT_FAILURE);
}
}
=============== OUTPUT =============
[myhome]$ gcc -ansi -pedantic -Wall -Wextra compress.c
[myhome]$ ./a.out bbfqqqRRRRtLL
You entered: [bbfqqqRRRRtLL]
Repetition begins at [0] = b, ends = 2
String to add = 2
Array = b2fqqqRRRRtLL
New Index = 2

Repetition begins at [3] = q, ends = 6
String to add = 3
Array = b2fq3RRRRtLL
New Index = 5

Repetition begins at [5] = R, ends = 9
String to add = 4
Array = b2fq3R4tLL
New Index = 7

Repetition begins at [8] = L, ends = 10
String to add = 2
Array = b2fq3R4tL2
New Index = 10

Result = [b2fq3R4tL2]

Stefan Ram · Apr 2, 2014

jay said:
PROBLEM: You are given a string FOOFIGHTERS. You have to come
up with an algorithm that will compress this string.

I am using a VLE (variable-length encoding).

When the first bit is 1, the bitsequence encodes the string
»FOOFIGHTERS«.

When the first bit is 0, the bitsequence encodes the text
that follows as a UTF-8 string (terminated by the first bit
pattern that is not allowed in UTF-8).

With this VLE, I can encode the string »FOOFIGHTERS« with
just on bit, i.e.: »1«.

Stefan Ram · Apr 2, 2014

Richard Damon said:
Now, if you have a definition of some limitations on what the input
string can contain, then perhaps you can use that to allow you to always
compress.

Yes.

Strings contain characters from a pre-determined character
set. If the input is known to be ASCII, but CHAR_BIT == 8,
we can always pack 8 characters into 7 char objects.

Also, the limitation does not have to be pre-programmed.
The program can start with no assumptions and then find
rules to describe the input itself at runtime.

David Brown · Apr 2, 2014

Hi,

Let me give you a few comments - I am sure others here will come up with
more. There is also a good chance that people will disagree with my
comments, including my comment about how much people here disagree.
Welcome to comp.lang.c!

gcc -ansi -pedantic -Wall -Wextra compress.c

It's good to see you using -Wall and -Wextra. But ANSI C is getting
very old - if you are starting C now, there is little reason not to
start with a more modern standard (unless you are intending to work on
existing code using an old standard). So use "-std=c11" rather than
"-ansi". If it is important to you that your code is strictly according
to the standards, then "-pedantic" makes sense. But if you are always
targeting gcc, then it makes sense to drop the "-pedantic" and use
"-std=gnu11" - this gives you access to many gcc extensions which can
give you better (but non-standard) code.

Enable optimisation (either -O1, -O2 or -Os. Don't bother with -O3, it
seldom helps, and don't bother with the more advanced flags unless you
want more advanced optimisation). You might not care about size and
speed at the moment, but compiling without optimising is like running a
marathon with your legs tied together. So learn to use optimisation
flags from the start (although sometimes you might want to disable
optimisation during debugging). Also, gcc can do a better job at some
warnings if optimisation is enabled, since it is doing more advanced
analysis. It will also help your learning - some subtle mistakes in
your code might give the desired code output without optimisation, but
give you unexpected crashes when optimising.

Put a space after "if", "for", etc. These are keywords, not functions.
Other than that, your spacing looks good.

Many people, myself included, prefer to put the brackets on the same
line as the conditional:

if (x) {
...
} else if (y) {
...
} else {
...
}

It's a matter of taste, but I think it is clearer (possibly simply
because I am used to it) and it is a little more compact.

There is no need to declare functions at the start of your code. You
only need to declare them if you use them before defining them - by
re-ordering your functions (such as putting main() at the end), you
avoid this. Again, it's a matter of taste, but it is very common to
order functions and avoid these extra declarations (occasionally they
are unavoidable).

Any functions (or data) that you define that is not going to be exported
outside the file should be declared "static". It helps avoid mistakes
and name-space collisions between files, makes linking faster, and gives
the compiler more scope for optimisation.

Don't use Yoda-style conditionals - if you want to check that "argc" is
not 2, write "if (argc != 2) ...". It is more natural, and thus
clearer and less likely to have an error. The idea of Yoda style "if (2
== argc)" to catch typos like "if (argc = 2)" was outdated long ago when
compilers (and linters) started warning about such typos. Let the
compiler do its job and catch these errors with -Wall.

PROBLEM: You are given a string FOOFIGHTERS. You have to come up with an algorithm that will compress this string. You also have to make sure that you are not using extra memory. For example: FOOFIGHTERS will be compressed as FO2FIGHTERS. You should not use another array or bitfield to keep a frequency count for the individual letters.

WHAT I DID: coded it in C. It is working fine.

WHAT I WANT from c.l.c: This is written purely to learn C. This is not home work and I am not getting any interview calls since an year . I wrote it because I wanted to know C better

Is the C code written here is of good quality as per c.l.c standard ?
Can more improvements be done ?
Did I miss any C library which could solve my task in a better way ?
do you see any hidden bugs there ?

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

enum { MAXNUMSZ = 100 };

void get_string(char** pp, const char* t);
void add_num(char* s, const size_t begin, const size_t cnt, size_t* end);
void convert_num_to_str(char* p, const size_t num);
void add_count_to_str(char* s, const char* snum, size_t* begin, size_t* end);
void move_elements(char* s, size_t begin, size_t end);
void compress_string(char* str, const size_t sz);

int main(int argc, char* argv[])
{
size_t sz = 0;
char* p = NULL;
if(2 != argc)
{
printf("Please provide one argument\n");
exit(EXIT_FAILURE);
}

sz = strlen(argv[1]) + 1;
get_string(&p, argv[1]);
printf("You entered: [%s]\n", p);
compress_string(p, sz);
printf("Result = [%s]\n", p);
free(p);

return 0;
}

void get_string(char** pp, const char* t)
{
size_t len;
char* mp;

len = strlen(t) + 1;
mp = malloc(len * (sizeof *mp));
if(NULL == mp)
{
printf("IN: %s @ %d: Out of Memory\n", __FILE__, __LINE__);
*pp = NULL;
exit(EXIT_FAILURE);
}
*pp = mp;
strcpy(*pp, t);
}

void compress_string(char* str, const size_t sz)
{
size_t bidx = 0;
size_t idx = 0;
size_t cnt = 0;
char prev = '\0';
char c = '\0';

for(idx = 0; idx < sz; ++idx)
{
c = str[idx];
/* printf("c = [%c], prev = [%c]\n", c, prev); */
if(c == prev)
{
++cnt;
}
else if(cnt > 1)
{
add_num(str, bidx, cnt, &idx);
printf("Array = %s\n", str);
printf("New Index = %u\n\n", idx);
--idx; /* for loop will increment it anyhow */
cnt = 0;
}
else
{
prev = c;
cnt = 1;
bidx = idx;
}
}
}

void add_num(char* s, const size_t begin, const size_t cnt, size_t* end)
{
char strnum[MAXNUMSZ+1] = {0};
size_t bidx = begin;
size_t eidx = *end;
printf("Repitition begins at [%u] = %c, ends = %u\n", begin, s[begin], *end);
convert_num_to_str(strnum, cnt);
printf("String to add = %s\n", strnum);
add_count_to_str(s, strnum, &bidx, &eidx);
move_elements(s, bidx, *end);
*end = eidx;
}

void move_elements(char* s, size_t begin, size_t end)
{
for(; s[end]; ++end)
{
s[begin++] = s[end];
}
s[begin] = '\0';
}

void add_count_to_str(char* s, const char* snum, size_t* begin, size_t* end)
{
(*begin)++;
for(;*snum; ++snum)
{
s[(*begin)++] = *snum;
}
/* reset the index */
*end = *begin;
}

void convert_num_to_str(char* p, const size_t num)
{
int ret = 0;
ret = sprintf(p, "%u", num);
if(0 > ret)
{
printf("IN: %s @%d: SPRINTF() ERROR\n", __FILE__, __LINE__);
exit(EXIT_FAILURE);
}
}
=============== OUTPUT =============
[myhome]$ gcc -ansi -pedantic -Wall -Wextra compress.c
[myhome]$ ./a.out bbfqqqRRRRtLL
You entered: [bbfqqqRRRRtLL]
Repetition begins at [0] = b, ends = 2
String to add = 2
Array = b2fqqqRRRRtLL
New Index = 2

Repetition begins at [3] = q, ends = 6
String to add = 3
Array = b2fq3RRRRtLL
New Index = 5

Repetition begins at [5] = R, ends = 9
String to add = 4
Array = b2fq3R4tLL
New Index = 7

Repetition begins at [8] = L, ends = 10
String to add = 2
Array = b2fq3R4tL2
New Index = 10

Result = [b2fq3R4tL2]

Keith Thompson · Apr 2, 2014

jay said:
PROBLEM: You are given a string FOOFIGHTERS. You have to come up with
an algorithm that will compress this string. You also have to make
sure that you are not using extra memory. For example: FOOFIGHTERS
will be compressed as FO2FIGHTERS. You should not use another array or
bitfield to keep a frequency count for the individual letters.

Not commenting (yet) on your C, but ...

It's difficult to be sure just what the algorithm is based on a single
example. You should define the algorithm unambiguously in English.

"FO2FIGHTERS" is just as long as "FOOFIGHTERS". This particular
algorithm doesn't actually compress the input at all unless it contains
a run of at least 3 of the same character.

You'll also need to decide what to do with a run of 10 or more
characters. "XXXXXXXXXX" will *probably* compress to "X10", but it
should be stated explicitly.

Assuming that the algorithm is:

Replace each run of 2 or more of the same character by a single
instance of that character followed by the count in decimal

it has the interesting characteristic that every input string is mapped
to an output string that's the same length or shorter.

It has the even more interesting characterstic that there's no unique
reverse mapping; the compression loses information. Both "FOOFIGHTERS"
and "FO2FIGHTERS" map to the same output string.

To solve that, you need a way to indicate whether a decimal string is a
count or just a sequence of digits copied from the input. This added
information means that at least some input strings will result in
*longer* output strings; this is unavoidable for any non-lossy
compression algorithm. (Unless you require a more limited character set
for the input than for the output.)

BartC · Apr 2, 2014

Stefan Ram said:
I am using a VLE (variable-length encoding).

When the first bit is 1, the bitsequence encodes the string
»FOOFIGHTERS«.

When the first bit is 0, the bitsequence encodes the text
that follows as a UTF-8 string (terminated by the first bit
pattern that is not allowed in UTF-8).

With this VLE, I can encode the string »FOOFIGHTERS« with
just on bit, i.e.: »1«.

How do you get back the string "FOOFIGHTERS" given just one bit?

If you need to store "FOOFIGHTERS" in the decoder, then I'm not sure that
will count.

Keith Thompson · Apr 2, 2014

BartC said:
How do you get back the string "FOOFIGHTERS" given just one bit?

If the bit is 1, return "FOOFIGHTERS".

If you need to store "FOOFIGHTERS" in the decoder, then I'm not sure that
will count.

Why not? It's just a compression algorithm that's highly optimized for
one particular input. (Other algorithms might be optimized for other
subsets of the possible inputs.)

Malcolm McLean · Apr 2, 2014

To solve that, you need a way to indicate whether a decimal string is a
count or just a sequence of digits copied from the input. This added
information means that at least some input strings will result in
*longer* output strings; this is unavoidable for any non-lossy
compression algorithm. (Unless you require a more limited character set
for the input than for the output.)

That's an inherent problem. But you can always store a single bit which
indicates that the following is plaintext, so your maximum inflation is just
one bit.

BartC · Apr 2, 2014

Keith Thompson said:
If the bit is 1, return "FOOFIGHTERS".

Why not? It's just a compression algorithm that's highly optimized for
one particular input. (Other algorithms might be optimized for other
subsets of the possible inputs.)

OK, fair enough. Here's the result of my attempt to compress all 12 million
books in the British Library:

1

But somehow, I doubt that's what the OP was looking for.

glen herrmannsfeldt · Apr 2, 2014

You might look at comp.compression for actually discussing
compression algorithms.

(snip)

First point, without more detail, in the strictest sense, the problem is
impossible. If you have an input string, which can contain any arbitrary
set of values, you can not compress it to always take less space. If you
can compress some input to be larger.

It is possible to guarantee that the compressed data isn't more than
one bit larger, though. (And you might find a single unused bit
lying around.) The Ultrium tape format allows for compressing data.
The drive compresses a data block, and compares its length to the
orignal. If it got bigger, it writes the original, and a bit in
the block header indicates which blocks are compressed. Tape blocks
have enough overhead that there is likely space for a bit in there
somewhere.

Other tape formats (I have seen DLT do it) can increase the size
by about 10% in some cases. If you know that data is already
compressed, don't use compression mode.

(snip)

Secondly, In the strictest sense, using "extra" memory as in memory
outside of the data buffer is also impossible, as any program you write
is going to need storage for SOME information outside of the array, if
only it is where in the array it currently is processing. If you
consider the last sentence to be a correction/clarification on the "no
extra memory" requirement, then you might have a chance.

It is common in describing the theory of algorithms to consider
both run time and memory requirements, usually in O() (bit oh)
notation. Normally you can ignore any O(1) memory requirements,
likely even O(log N) requirements, but often not O(N) or higher.

You will probably not be able to use much of the library to
help you, as the library is "black box", and thus it would be
hard to know if the library might break your "extra memory" rule.

-- glen

Barry Schwarz · Apr 2, 2014

Ignoring any possible issues with your algorithm and addressing only
the "quality" of the C code:

It would make you code much easier to read and help you perform
any later maintenance if you learned to indent consistently by a
visibly significant amount.

You call strlen to compute the length of the command line argument
twice, once in main and again in get_string. Since you use the value
in both, you could either pass the value to get_string as an argument
or have get_string return the value to main for its use.

In get_string, there is no point in assigning a value to *pp if
the next statement is a call to exit.

In get_string, mp is an unnecessary local variable. You can
replace it with *pp and achieve the same results.

In compress_string, you attempt to print idx with %u. idx is a
size_t which is unsigned but not necessarily int. To avoid undefined
behavior, you should either cast the value to unsigned int or use %zu
as the conversion specification. compress_string, add_num, and
convert_num_to_str have this problem also.

MAXNUMSZ is way too large. It is the number of characters in a
string representing the number of times a character is repeated. Do
you really expect a character to be repeated several billion billion
billion times. Defining it as 5 would allow for over 9 thousand
repetitions which should be adequate.

In convert_num_to_str, you might want to consider using snprintf
instead of sprintf to insure you never overrun the array pointed to by
p.

In compress_string, the for loop processes the full length of the
input string even after the string has been shortened as the result of
processing a previous set of repeated characters. This results in
some substrings being processed more than once needlessly. Processing
the string like abbbbbbbcddddddde\0 produces ab7cd7e\0cd7e\0d3e\0\0.
It doesn't show up in your output because printf ignores any
characters after the first '\0'.

PROBLEM: You are given a string FOOFIGHTERS. You have to come up with an algorithm that will compress this string. You also have to make sure that you are not using extra memory. For example: FOOFIGHTERS will be compressed as FO2FIGHTERS. You should not use another array or bitfield to keep a frequency count for the individual letters.

WHAT I DID: coded it in C. It is working fine.

WHAT I WANT from c.l.c: This is written purely to learn C. This is not home work and I am not getting any interview calls since an year . I wrote it because I wanted to know C better

Is the C code written here is of good quality as per c.l.c standard ?
Can more improvements be done ?
Did I miss any C library which could solve my task in a better way ?
do you see any hidden bugs there ?

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

enum { MAXNUMSZ = 100 };

void get_string(char** pp, const char* t);
void add_num(char* s, const size_t begin, const size_t cnt, size_t* end);
void convert_num_to_str(char* p, const size_t num);
void add_count_to_str(char* s, const char* snum, size_t* begin, size_t* end);
void move_elements(char* s, size_t begin, size_t end);
void compress_string(char* str, const size_t sz);

int main(int argc, char* argv[])
{
size_t sz = 0;
char* p = NULL;
if(2 != argc)
{
printf("Please provide one argument\n");
exit(EXIT_FAILURE);
}

sz = strlen(argv[1]) + 1;
get_string(&p, argv[1]);
printf("You entered: [%s]\n", p);
compress_string(p, sz);
printf("Result = [%s]\n", p);
free(p);

return 0;
}

void get_string(char** pp, const char* t)
{
size_t len;
char* mp;

len = strlen(t) + 1;
mp = malloc(len * (sizeof *mp));
if(NULL == mp)
{
printf("IN: %s @ %d: Out of Memory\n", __FILE__, __LINE__);
*pp = NULL;
exit(EXIT_FAILURE);
}
*pp = mp;
strcpy(*pp, t);
}

void compress_string(char* str, const size_t sz)
{
size_t bidx = 0;
size_t idx = 0;
size_t cnt = 0;
char prev = '\0';
char c = '\0';

for(idx = 0; idx < sz; ++idx)
{
c = str[idx];
/* printf("c = [%c], prev = [%c]\n", c, prev); */
if(c == prev)
{
++cnt;
}
else if(cnt > 1)
{
add_num(str, bidx, cnt, &idx);
printf("Array = %s\n", str);
printf("New Index = %u\n\n", idx);
--idx; /* for loop will increment it anyhow */
cnt = 0;
}
else
{
prev = c;
cnt = 1;
bidx = idx;
}
}
}

void add_num(char* s, const size_t begin, const size_t cnt, size_t* end)
{
char strnum[MAXNUMSZ+1] = {0};
size_t bidx = begin;
size_t eidx = *end;
printf("Repitition begins at [%u] = %c, ends = %u\n", begin, s[begin], *end);
convert_num_to_str(strnum, cnt);
printf("String to add = %s\n", strnum);
add_count_to_str(s, strnum, &bidx, &eidx);
move_elements(s, bidx, *end);
*end = eidx;
}

void move_elements(char* s, size_t begin, size_t end)
{
for(; s[end]; ++end)
{
s[begin++] = s[end];
}
s[begin] = '\0';
}

void add_count_to_str(char* s, const char* snum, size_t* begin, size_t* end)
{
(*begin)++;
for(;*snum; ++snum)
{
s[(*begin)++] = *snum;
}
/* reset the index */
*end = *begin;
}

void convert_num_to_str(char* p, const size_t num)
{
int ret = 0;
ret = sprintf(p, "%u", num);
if(0 > ret)
{
printf("IN: %s @%d: SPRINTF() ERROR\n", __FILE__, __LINE__);
exit(EXIT_FAILURE);
}
}
=============== OUTPUT =============
[myhome]$ gcc -ansi -pedantic -Wall -Wextra compress.c
[myhome]$ ./a.out bbfqqqRRRRtLL
You entered: [bbfqqqRRRRtLL]
Repetition begins at [0] = b, ends = 2
String to add = 2
Array = b2fqqqRRRRtLL
New Index = 2

Repetition begins at [3] = q, ends = 6
String to add = 3
Array = b2fq3RRRRtLL
New Index = 5

Repetition begins at [5] = R, ends = 9
String to add = 4
Array = b2fq3R4tLL
New Index = 7

Repetition begins at [8] = L, ends = 10
String to add = 2
Array = b2fq3R4tL2
New Index = 10

Result = [b2fq3R4tL2]

Osmium · Apr 2, 2014

:

It is common in describing the theory of algorithms to consider
both run time and memory requirements, usually in O() (bit oh)
notation.

Should be big oh, not bit oh.

Your problem statement is awful. You can find better ones on the web.
Make "c program" part of the query.

If you are interested in compression (you said you were not) look up Huffman
encoding. It is easy to undertand and fun to fool with.

Keith Thompson · Apr 2, 2014

Malcolm McLean said:
That's an inherent problem. But you can always store a single bit which
indicates that the following is plaintext, so your maximum inflation is just
one bit.

You can't always store a single bit if your file system only supports
whole numbers of bytes.

Keith Thompson · Apr 2, 2014

glen herrmannsfeldt said:
You might look at comp.compression for actually discussing
compression algorithms.

But watch out for the kooks who think they can compress arbitrary data
(something that's mathematically impossible). I seem to recall that the
FAQ is pretty good, though.

Kaz Kylheku · Apr 2, 2014

You can't always store a single bit if your file system only supports
whole numbers of bytes.

Most file systems support only a whole number of *blocks*.

But never mind that.

Even if the file system actually stores bytes rather than blocks,
it means that the bit streams which represent compressed files require
between 0 and 7 bits of padding. This means that 7 out of 8 times,
a compressed file has slack space which can absorb an extra bit.

So 7/8 files will stay the same length when extended by a bit,
and 1/8 will add an extra byte.

Over many files, this averages out to one bit per file, assuming
there is a uniform distribution of the eight possible padding lengths.

jay · Apr 3, 2014

On Thursday, April 3, 2014 1:17:26 AM UTC+5:30, Barry Schwarz wrote:

I agree with most posters here that problem statement is badly written and I should have written my own explanation than taking one from web. Its nameis compress but it does not compress anything. It is just a program to which we input a string, containing only letters, and it replaces 2 or more occurrences of a letter with a number e.g.

ppppplllerr -> p5l3er2
avatar -> avatar
queen -> que2n

so, string may or may not shorten in length. This question was in Google orMicrosoft or amazon interview. I got it from careercup.com. I think the objective of the interviewer was to check how fast/better a person thinks when it comes to solving a problem. It took me around 12 hours to code this. I guess they must be wanting the coded solution in 20 minutes or something.I have never been able to solve these kinds of problems and my C is still not that good. I wanted to become better at problem solving using C and that is the reason I did this.

It would make you code much easier to read and help you perform
any later maintenance if you learned to indent consistently by a
visibly significant amount.

I tried very hard but the tools I have here are not under my control. I usevery, very and very old tools and it is extremely difficult to type/indent/copy/delete/yank the text in the tool I use which somehow works like a text editor. I am really sorry, I can't do better than that. I do get your point. On my personal computer I use emacs with indent-tab-mode set to nil. I know how much indentation and spacing is important. unfortunately, my access is limited when I don't have my personal computer.

In compress_string, you attempt to print idx with %u.
idx is a size_t which is unsigned but not necessarily
int. To avoid undefined behavior, you should either
cast the value to unsigned int or use %zu

As per standard (n1570), section 7.19, size_t is unsigned int type. 2nd, I can not use %zu in C90 mode. I use C90 standard because that is what most of posters use and advise here and 2nd the tools I have here are 14 years old at minimum:

#include <stdio.h>

int main(void)
{
size_t d = 9;
printf("d = %zu\n", d);

return 0;
}
================== OUTPUT ==================
[myhome]$ cd programs/c
[myhome]$ gcc -ansi -pedantic -Wall -Wextra test.c
test.c: In function `main':
test.c:9: warning: ISO C90 does not support the `z' printf length modifier

In convert_num_to_str, you might want to consider using snprintf
instead of sprintf to insure you never overrun the array pointed
to by p.

snpritnf does not solve the issue of overwriting. I read that several timesin questions posted here. 2nd, no snprintf in C90 standard

In compress_string, the for loop processes the full length of the
nput string even after the string has been shortened as the result of
processing a previous set of repeated characters. This results in
some substrings being processed more than once needlessly. Processing
the string like abbbbbbbcddddddde\0 produces ab7cd7e\0cd7e\0d3e\0\0.
It doesn't show up in your output because printf ignores any
characters after the first '\0'.

I tried printing the output characters for size equal to the size of input and I got this: "ab7cd7ed3e2d3e". You seem to be right. I thought my for loop is not re-processing anything because I was ncrementing (and changing) the index. I was wrong. But how do you solve this ? I got no idea.

Barry Schwarz · Apr 3, 2014

As per standard (n1570), section 7.19, size_t is unsigned int type. 2nd, I can not use %zu in C90 mode.
I use C90 standard because that is what most of posters use and advise here and 2nd the tools I have
here are 14 years old at minimum:

As per standard (n1570), section 7.19, size_t is unsigned int type. 2nd, I can not use %zu in C90 mode. I use C90
standard because that is what most of posters use and advise here and 2nd the tools I have here are 14 years old at minimum:

You are misreading the standard. section 7.19 says "size_t which is
the unsigned integer type of the result of the sizeof operator;" Note
that it says integer, not int.

The contents of the C11 standard are irrelevant if you are using a
compiler limited to C90.

The C90 standard says the same thing in section 4.1.5: "size_t
which is the unsigned integral type of the result of the sizeof
operator;"

On many C90 systems, size_t is actually unsigned long.

If you cannot use %zu, then cast the value to int as suggested and
continue using %u.

Ian Collins · Apr 3, 2014

Barry said:
If you cannot use %zu, then cast the value to int as suggested and
continue using %u.

Cast the value to *unsigned int*.

Stefan Ram · Apr 3, 2014

Barry Schwarz said:
On many C90 systems, size_t is actually unsigned long.
If you cannot use %zu, then cast the value to int

unsigned long

as suggested and continue using %u.

%lu

Barry Schwarz · Apr 3, 2014

I tried printing the output characters for size equal to the size of input and I got this: "ab7cd7ed3e2d3e". You seem to
be right. I thought my for loop is not re-processing anything because I was ncrementing (and changing) the index. I was
wrong. But how do you solve this ? I got no idea.

One way would be to change the terminating condition in the for
statement from "idx<sz" to "str[idx]" which would stop the loop at the
first \0 encountered.

RSA implementation issues in public key pem loader function	0	May 21, 2025
Universal BMP Steganography Tool (AES-128-CTR + SP800-90A CSPRNG) Full Encoder/Decoder with 3LSB Payload, PasswordDerived Key & External Key File	4	Mar 26, 2026
Fibonacci	0	May 13, 2023
Database Manager: A C++ Console Application	14	May 12, 2025
Implementing a Q-Learning Algorithm with Logistic Regression Normalization in C++	0	Jun 4, 2025
AES-128 Clipboard Protector: Auto-Encrypt Ctrl+C, Smart-Decrypt Ctrl+V (C++ Windows Hook)	7	Mar 24, 2026
Need Helping adding Square root code to an existing calculator. (Absolute begginer?)	0	Jan 12, 2025
Building a Professional Neural Network Framework: Full C++ Implementation with Windows GUI and Real-Time Training Visualization	11	Jun 14, 2026

compressing charatcers

jay

Stefan Ram

Stefan Ram

David Brown

Keith Thompson

BartC

Keith Thompson

Malcolm McLean

BartC

glen herrmannsfeldt

Barry Schwarz

Osmium

Keith Thompson

Keith Thompson

Kaz Kylheku

jay

Barry Schwarz

Ian Collins

Stefan Ram

Barry Schwarz

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads