Can we replace 8 bits by 2 bits?

Discussion in 'C++' started by Umesh, Jan 5, 2007.

1. UmeshGuest

This is a basic thing.
Say A=0100 0001 in ASCII which deals with 256 characters(you know
better than me!)
But we deal with only four characters and 2 bits are enough to encode
them. I want to confirm if we can encode A in 2bits(say 00), B in 2
bits (01), C in 2 bits(10) and D in 2 bits by some program. I only use
this four alphabet in my work. Can u pl write a sample program to reach
my goal?

Umesh, Jan 5, 2007

2. jacob naviaGuest

Umesh a écrit :
> This is a basic thing.
> Say A=0100 0001 in ASCII which deals with 256 characters(you know
> better than me!)
> But we deal with only four characters and 2 bits are enough to encode
> them. I want to confirm if we can encode A in 2bits(say 00), B in 2
> bits (01), C in 2 bits(10) and D in 2 bits by some program. I only use
> this four alphabet in my work. Can u pl write a sample program to reach
> my goal?
>

Dear customer

We are ready to fulfill your request, and we thank you for the
confidence you give us by placing your order.

card payments are accepted we will gladly send you the requested
program.

Yours sincerely

J.K. OB

Customer support.

P.S.
www.DoMyHomework.com

jacob navia, Jan 5, 2007

3. Ondra HolubGuest

Umesh napsal:
> This is a basic thing.
> Say A=0100 0001 in ASCII which deals with 256 characters(you know
> better than me!)
> But we deal with only four characters and 2 bits are enough to encode
> them. I want to confirm if we can encode A in 2bits(say 00), B in 2
> bits (01), C in 2 bits(10) and D in 2 bits by some program. I only use
> this four alphabet in my work. Can u pl write a sample program to reach
> my goal?

Yes, you can encode it this way, but you would have problem to work
with it. For example accessing 6th character of such array would be
something like (I did not test following code):

char GetNthChar(const char array[] a, size_t index)
{
const size_t im4 = index % 4 * 2;
return (a[index / 4] & (3 << im4)) >> im4;
}

However it may be usefull to store your data in such format. So I would
recommend to encode it before storing, decode it after loading and work
with ordinary char array containing values 0, 1, 2 and 4 only.

If you really need to save couple of bytes, you should write some
wrapping class which overloads operator[] and hides all these bit
operations. It would be something like std::vector<bool> not for 1 bit
values, but for 2 bit values.

Ondra Holub, Jan 5, 2007
4. osmiumGuest

"Umesh" wrote:

> Say A=0100 0001 in ASCII which deals with 256 characters(you know
> better than me!)
> But we deal with only four characters and 2 bits are enough to encode
> them. I want to confirm if we can encode A in 2bits(say 00), B in 2
> bits (01), C in 2 bits(10) and D in 2 bits by some program. I only use
> this four alphabet in my work. Can u pl write a sample program to reach
> my goal?

The switch statement *may* be germane to your problem.

BTW, ASCII does not deal with 256 characters, ASCII consists of only 128
characters. Terminology is a bitch since it is often used improperly by the
very people who in fact know better. It's "quicker" that way. :-(

osmium, Jan 5, 2007
5. mlimberGuest

[cross-posting deleted]

Ondra Holub wrote:
> Umesh napsal:
> > This is a basic thing.
> > Say A=0100 0001 in ASCII which deals with 256 characters(you know
> > better than me!)
> > But we deal with only four characters and 2 bits are enough to encode
> > them. I want to confirm if we can encode A in 2bits(say 00), B in 2
> > bits (01), C in 2 bits(10) and D in 2 bits by some program. I only use
> > this four alphabet in my work. Can u pl write a sample program to reach
> > my goal?

[...]
> If you really need to save couple of bytes, you should write some
> wrapping class which overloads operator[] and hides all these bit
> operations. It would be something like std::vector<bool> not for 1 bit
> values, but for 2 bit values.

Cheers! --M

mlimber, Jan 5, 2007
6. UmeshGuest

Suppose that I define an array of 2 bits {00,01,10,11} . Now when the
program finds A in the text file it replaces with 00, B with 01, C with
10 and D with 11. So the encoded file will take 1/8 of the space of the
original file.

During decoding I'll replace 00 by A, 01 by B, 10 by C and 11 by D to
regain the original file.

I'm an inexperienced programmer. Pl help.

Umesh, Jan 5, 2007
7. Lew PitcherGuest

Umesh wrote:
> Suppose that I define an array of 2 bits {00,01,10,11} . Now when the
> program finds A in the text file it replaces with 00, B with 01, C with
> 10 and D with 11. So the encoded file will take 1/8 of the space of the
> original file.
>
> During decoding I'll replace 00 by A, 01 by B, 10 by C and 11 by D to
> regain the original file.
>
> I'm an inexperienced programmer. Pl help.

Your question doesn't really belong in comp.lang.c
someone to teach you the rudiments of the skill of writing programs.

Write the code to do this:

open the input file
open the output file
for each character in the input file
if the character is 'A'
write binary 00 to the output file
else if the character is 'B'
write binary 01 to the output file
else if the character is 'C'
write binary 10 to the output file
else if the character is 'D'
write binary 11 to the output file
else
do nothing
end-if
end-if
end-if
end-if
end-for-loop
close the output file
close the input file
terminate the program

HTH
--
Lew

Lew Pitcher, Jan 5, 2007
8. Richard HeathfieldGuest

Umesh said:

> This is a basic thing.
> Say A=0100 0001 in ASCII which deals with 256 characters

128, actually, but I know what you mean. Certainly 8 bits are sufficient to
encode 256 characters, which is what you actually care about.

> But we deal with only four characters and 2 bits are enough to encode
> them. I want to confirm if we can encode A in 2bits(say 00), B in 2
> bits (01), C in 2 bits(10) and D in 2 bits by some program. I only use
> this four alphabet in my work. Can u pl write a sample program to reach
> my goal?

Here's some code to split a byte into four:

void decode(char *letter, int ch)
{
const char alphabet[] = "ABCD";

for(i = 0; i < 4; i++)
{
letter = alphabet[(ch & mask) >> (i * 2)];
}
}

letter must point to the first element in an array of at least four chars.
Note that decode() does not build a string. If you want a string, deal with
the null terminator yourself.

If you are decoding, say, 0xAD, this is 10101101 in binary, and at the end
of the decoding process letter[0] will store 'C', letter[1] will store 'C',
letter[2] will store 'D', and letter[3] will store 'B'.

Encoding is quite easy too. Simply reverse the process. For decoding,
though, you may find it convenient to have an alphabet array of UCHAR_MAX +
1 bytes, all of which have the value 0, but set alphabet['A'] to 1,
alphabet['B'] to 2, alphabet['C'] to 3, and alphabet['D'] to 4. Then you
can say: if(alphabet[letter] == 0) { error - invalid code } else { your
OR-mask is alphabet[letter] - 1 so you can OR it into your encoding, and

--
Richard Heathfield
"Usenet is a strange place" - dmr 29/7/1999
http://www.cpax.org.uk
email: rjh at the above domain, - www.

Richard Heathfield, Jan 5, 2007
9. Richard HeathfieldGuest

Umesh said:

> Suppose that I define an array of 2 bits {00,01,10,11} . Now when the
> program finds A in the text file it replaces with 00, B with 01, C with
> 10 and D with 11. So the encoded file will take 1/8 of the space of the
> original file.

A quarter.

--
Richard Heathfield
"Usenet is a strange place" - dmr 29/7/1999
http://www.cpax.org.uk
email: rjh at the above domain, - www.

Richard Heathfield, Jan 5, 2007
10. UmeshGuest

> Your question doesn't really belong in comp.lang.c
> someone to teach you the rudiments of the skill of writing programs.
>
> Write the code to do this:
>
> open the input file
> open the output file
> for each character in the input file
> if the character is 'A'
> write binary 00 to the output file
> else if the character is 'B'
> write binary 01 to the output file
> else if the character is 'C'
> write binary 10 to the output file
> else if the character is 'D'
> write binary 11 to the output file
> else
> do nothing
> end-if
> end-if
> end-if
> end-if
> end-for-loop
> close the output file
> close the input file
> terminate the program
>
> HTH
> --
> Lew

Dear Lew,
I'm not asking you to put down the algorithm which I already did. I
want you to write a part of the program. I won't ask you to do that
once I learn C well. Because I've only learnt to make algorithms and
determine their time complexity, I need your help.

I heard that it is easier to implement programs than to make effective
algorithm which I already did. Actually my original algo is far more
complex than this. But I want to start from the simple one. Becuse I've
none to teach me, I think sometimes a little help comes handy. That's
why I'm here. I hope that expert folks like you won't upset me. Thank
you. I look forward to hear from you again. God bless you.

Umesh, Jan 5, 2007
11. jacob naviaGuest

Umesh a écrit :
>>Your question doesn't really belong in comp.lang.c
>>someone to teach you the rudiments of the skill of writing programs.
>>
>>Write the code to do this:
>>
>> open the input file
>> open the output file
>> for each character in the input file
>> if the character is 'A'
>> write binary 00 to the output file
>> else if the character is 'B'
>> write binary 01 to the output file
>> else if the character is 'C'
>> write binary 10 to the output file
>> else if the character is 'D'
>> write binary 11 to the output file
>> else
>> do nothing
>> end-if
>> end-if
>> end-if
>> end-if
>> end-for-loop
>> close the output file
>> close the input file
>> terminate the program
>>
>>HTH
>>--
>>Lew

>
>
> Dear Lew,
> I'm not asking you to put down the algorithm which I already did. I
> want you to write a part of the program. I won't ask you to do that
> once I learn C well. Because I've only learnt to make algorithms and
> determine their time complexity, I need your help.
>

Lew is right.

You will NOT learn until you practice. And you will not practice
if somebody else does the work for you.

You must learn the basics first. Buy the book from Kernighan and
Ritchie, learn it, and then ask questions. I learned C that way,
and there wasn't any body else there to ask questions. It is
perfectly doable if you WORK.

Or pay a class in computer programming. That is possible too.

But we can't replace a teacher or a book, and we will not work

jacob navia, Jan 5, 2007
12. Lew PitcherGuest

Umesh wrote:
[snip]
> I'm not asking you to put down the algorithm which I already did. I
> want you to write a part of the program.

Sorry, but no.

But, I'll make it simpler for you

First off, just write the code that frames your program. This is the
minimum code; just the startup and termination. Something like

#include <stdlib.h>
int main(void)
{
return EXIT_SUCCESS;
}

Compile and run this code, making changes until it works. This
shouldn't take very long, as this rudimentary code is almost foolproof.

Next, add in the file open and close functions and retest

Next, add in the logic to choose between 'A', 'B', 'C', and 'D', and
retest

Next, add in the logic to feed bit pairs to the output (this doesn't
/yet/ have to write the pairs to the output), and retest

Next, add in the logic to write out the bit pairs to the output, and
retest

Now, you are done

[snip]

Lew Pitcher, Jan 5, 2007
13. Guest

Richard Heathfield wrote:
> Umesh said:
>
> > This is a basic thing.
> > Say A=0100 0001 in ASCII which deals with 256 characters

>
> 128, actually, but I know what you mean. Certainly 8 bits are sufficient to
> encode 256 characters, which is what you actually care about.
>
> > But we deal with only four characters and 2 bits are enough to encode
> > them. I want to confirm if we can encode A in 2bits(say 00), B in 2
> > bits (01), C in 2 bits(10) and D in 2 bits by some program. I only use
> > this four alphabet in my work. Can u pl write a sample program to reach
> > my goal?

>
> Here's some code to split a byte into four:
>
> void decode(char *letter, int ch)
> {
> const char alphabet[] = "ABCD";
>
> for(i = 0; i < 4; i++)
> {
> letter = alphabet[(ch & mask) >> (i * 2)];
> }
> }

Well i am not sure but i think and
mask <<=2, and then you have to reverse letter
or if we know number of digits
then shift ch&mask by appropriate amount.

>
> letter must point to the first element in an array of at least four chars.
> Note that decode() does not build a string. If you want a string, deal with
> the null terminator yourself.
>
> If you are decoding, say, 0xAD, this is 10101101 in binary, and at the end
> of the decoding process letter[0] will store 'C', letter[1] will store 'C',
> letter[2] will store 'D', and letter[3] will store 'B'.
>

It is giving strange stuff

> Encoding is quite easy too. Simply reverse the process. For decoding,
> though, you may find it convenient to have an alphabet array of UCHAR_MAX +
> 1 bytes, all of which have the value 0, but set alphabet['A'] to 1,
> alphabet['B'] to 2, alphabet['C'] to 3, and alphabet['D'] to 4. Then you
> can say: if(alphabet[letter] == 0) { error - invalid code } else { your
> OR-mask is alphabet[letter] - 1 so you can OR it into your encoding, and
>
> --
> Richard Heathfield
> "Usenet is a strange place" - dmr 29/7/1999
> http://www.cpax.org.uk
> email: rjh at the above domain, - www.

, Jan 5, 2007
14. Richard HeathfieldGuest

said:

> Richard Heathfield wrote:

<snip>

>> Here's some code to split a byte into four:
>>
>> void decode(char *letter, int ch)
>> {
>> const char alphabet[] = "ABCD";
>>
>> for(i = 0; i < 4; i++)
>> {
>> letter = alphabet[(ch & mask) >> (i * 2)];
>> }
>> }

>

oops

> Well i am not sure but i think and

oops squared

Let's just pretend that article didn't happen, shall we? :-(

--
Richard Heathfield
"Usenet is a strange place" - dmr 29/7/1999
http://www.cpax.org.uk
email: rjh at the above domain, - www.

Richard Heathfield, Jan 5, 2007
15. Jim LangstonGuest

"Umesh" <> wrote in message
news:...
> This is a basic thing.
> Say A=0100 0001 in ASCII which deals with 256 characters(you know
> better than me!)
> But we deal with only four characters and 2 bits are enough to encode
> them. I want to confirm if we can encode A in 2bits(say 00), B in 2
> bits (01), C in 2 bits(10) and D in 2 bits by some program. I only use
> this four alphabet in my work. Can u pl write a sample program to reach
> my goal?

Long answer, it's probably more pain than it's worth.

The problem is modern computers usually use 8 bit bytes. A byte is the
smallest unit the computer will handle as one unit, and it is usually 8 bits
(it may be more or maybe a few less on some system). So, if you defined
each value as 2 bits (A, B, C, D or 0, 1, 2, 3 or whatever) it would still
need to be stored in a byte. With an 8 bit byte you could store 4 of these
in each byte. But, since most computers deal with a minimum of 8 bits at a
time, it would be up to you to extract the bits yourself.

This means you couldn't use some simple define or such, you'd need to make a
class and things get complicated from there.

Unless you are storing a whole lot of these so size becomes an issue, it is
easier to just waste the extra 6 bits of the byte and store each value in a
byte so you can use std::vector and such more easy.

Jim Langston, Jan 5, 2007
16. Richard HeathfieldGuest

Jim Langston said:

<snip>
>
> Unless you are storing a whole lot of these so size becomes an issue, it
> is easier to just waste the extra 6 bits of the byte and store each value
> in a byte so you can use std::vector and such more easy.

He can't use std::vector because there's no such thing. Except, of course,
that you know perfectly well that there *is* such a thing. But I know
you're wrong - and you know I'm wrong - and that's the trouble with
cross-posting.

--
Richard Heathfield
"Usenet is a strange place" - dmr 29/7/1999
http://www.cpax.org.uk
email: rjh at the above domain, - www.

Richard Heathfield, Jan 5, 2007
17. Jim LangstonGuest

"Richard Heathfield" <> wrote in message
news:...
> Jim Langston said:
>
> <snip>
>>
>> Unless you are storing a whole lot of these so size becomes an issue, it
>> is easier to just waste the extra 6 bits of the byte and store each value
>> in a byte so you can use std::vector and such more easy.

>
> He can't use std::vector because there's no such thing. Except, of course,
> that you know perfectly well that there *is* such a thing. But I know
> you're wrong - and you know I'm wrong - and that's the trouble with
> cross-posting.

Oh, my bad. I missed the fact he cross posted. I even looked and made sure
I was replying to comp.lang.c++ before I gave a c++ answer. Yes, cross
posting is evil.

Jim Langston, Jan 5, 2007
18. Kenny McCormackGuest

In article <>,
Richard Heathfield <> wrote:
>Umesh said:
>
>> Suppose that I define an array of 2 bits {00,01,10,11} . Now when the
>> program finds A in the text file it replaces with 00, B with 01, C with
>> 10 and D with 11. So the encoded file will take 1/8 of the space of the
>> original file.

>
>A quarter.

True. But he's right, too. It does take an eighth of the space of the
original.

And then another eighth...

Kenny McCormack, Jan 6, 2007
19. Kenny McCormackGuest

In article <fCynh.59\$>,
Jim Langston <> wrote:
>"Richard Heathfield" <> wrote in message
>news:...
>> Jim Langston said:
>>
>> <snip>
>>>
>>> Unless you are storing a whole lot of these so size becomes an issue, it
>>> is easier to just waste the extra 6 bits of the byte and store each value
>>> in a byte so you can use std::vector and such more easy.

>>
>> He can't use std::vector because there's no such thing. Except, of course,
>> that you know perfectly well that there *is* such a thing. But I know
>> you're wrong - and you know I'm wrong - and that's the trouble with
>> cross-posting.

>
>Oh, my bad. I missed the fact he cross posted. I even looked and made sure
>I was replying to comp.lang.c++ before I gave a c++ answer. Yes, cross
>posting is evil.

You know, it is interesting. "Everybody knows" that (i.e., the
conventional wisdom is that) cross-posting is better than multi-posting,
but I have often argued that that bit of CW 'taint so. For, among other
reasons, this one. Cross-posting assumes that answers are correct (or,
more precisely, that answers can be evaluated) regardless of which forum
they are posted in. A perfectly reasonable layman's position, to be
sure, but, as we see here, not good enough for us experts.

Whereas, if the newbie multi-posts (which is his natural inclination,
given that many [most?] of the commonly available newbie tools - i.e.,
Google and Microsoft - make proper cross-posting difficult), he could then
follow the responses independently in each forum and deal accordingly.

P.S. Specific example. Every once in a awhile, somebody will post some
sort of Unix-y/C-y question, cross-posting it to a dozen or so Unix-y/C-y
groups, including clc, and the clc pedants will do their usual:

Off topic. Not portable. Cant discuss it here. Blah, blah, blah.

routine, posting that bit of valuable information to, of course, all
dozen or so groups - after which the post degenerates into the usual clc
bickering about topicality, all the while being posted to all dozen or
so groups. This despite the fact that the post *was* topical in
probably all but one (or maybe two, if clc++ was included and the
participants there are as anal as the clc guys) of those groups.

And the point, of course, is that simple multi-posting would have
avoided this mess.

Kenny McCormack, Jan 6, 2007
20. Cesar RabakGuest

Kenny McCormack escreveu:
> In article <fCynh.59\$>,

[snipped]

> You know, it is interesting. "Everybody knows" that (i.e., the
> conventional wisdom is that) cross-posting is better than multi-posting,
> but I have often argued that that bit of CW 'taint so. For, among other
> reasons, this one. Cross-posting assumes that answers are correct (or,
> more precisely, that answers can be evaluated) regardless of which forum
> they are posted in. A perfectly reasonable layman's position, to be
> sure, but, as we see here, not good enough for us experts.
>
> Whereas, if the newbie multi-posts (which is his natural inclination,
> given that many [most?] of the commonly available newbie tools - i.e.,
> Google and Microsoft - make proper cross-posting difficult), he could then
> follow the responses independently in each forum and deal accordingly.
>
> P.S. Specific example. Every once in a awhile, somebody will post some
> sort of Unix-y/C-y question, cross-posting it to a dozen or so Unix-y/C-y
> groups, including clc, and the clc pedants will do their usual:
>
> Off topic. Not portable. Cant discuss it here. Blah, blah, blah.
>
> routine, posting that bit of valuable information to, of course, all
> dozen or so groups - after which the post degenerates into the usual clc
> bickering about topicality, all the while being posted to all dozen or
> so groups. This despite the fact that the post *was* topical in
> probably all but one (or maybe two, if clc++ was included and the
> participants there are as anal as the clc guys) of those groups.
>
> And the point, of course, is that simple multi-posting would have
> avoided this mess.
>

Perhaps the point is the 'experts' and zealots that have so quick
fingers to post about non topicality could change the behaviour either
not post at all (exchanging the 'off topic' repply by silence) or first
look at header of the msg and see if it is cross-posted.

Cesar Rabak, Jan 6, 2007