how to write bits by using ofstream?

S

Sowen

Hi, all

I am wondering how to write bits by using ofstream?

I have finished a huffman tree, but how can I write the bits to the file in
order to gain compression?

for example, 'A' returns a code '1101', what I should write? 13? no, I don't
think so


Just now I tried my own thinking, 'cause 4 bits can represent a hex, so I
write 'D' instead of '13' in the above example,
but, I can only gain compression if there are a few unique characters, like
"abc abc abc" or "aaaaaaaaaaaaaaa"
If there are many different characters, my thinking cannot "compress" the
file.


I have worked on this for couple of days, still can't figure out what I
should write and how I can write the bits to the file in order to compress
file.
Please help, thank you very much!



--
Best Regards!
Sowen Cheung
http://com.angGoGo.com
http://www.angGoGo.com
http://biz.angGoGo.com
 
V

Victor Bazarov

Sowen said:
I am wondering how to write bits by using ofstream?

I have finished a huffman tree, but how can I write the bits to the file
in order to gain compression?

You cannot write bits. The smallest amount you are allowed to write
is a char (byte). In order to write bits, you need to pack them in
a bunch of bytes and then write those bytes to the file stream.

If after your effort you have 269 bits to write, pack them up, you
will get ceil(269/CHAR_BIT) bytes (on many systems CHAR_BITS is 8,
so you get 34 bytes), and write those 34 bytes out.

This is beyond the scope of comp.lang.c++, however, but you need to
somehow identify the sequences of bits. My guess is that you should
read more on the subject of compression and how compressed information
is saved to files. Imagine that once you written those packed bits
out to a file, you will want to be able to read and unpack them in
some way. So, you cannot simply stack those bits in a large byte
array, you need extra information placed there to help with unpacking.

The Web undoubtedly contains enough information on the subject.

V
 
J

Jerry Coffin

Sowen said:
Hi, all

I am wondering how to write bits by using ofstream?

I have finished a huffman tree, but how can I write the bits to the
file in order to gain compression?

for example, 'A' returns a code '1101', what I should write? 13? no,
I don't think so


Just now I tried my own thinking, 'cause 4 bits can represent a hex,
so I write 'D' instead of '13' in the above example, but, I can only
gain compression if there are a few unique characters, like "abc abc
abc" or "aaaaaaaaaaaaaaa" If there are many different characters, my
thinking cannot "compress" the file.

Normally, you accumulate bits until you get at least one full byte, and
then write that to the stream. In reality, one byte is also a pretty
small amount -- it's usually better to accumulate a buffer full of
bytes before writing out the data.
 
J

Jerry Coffin

Victor Bazarov wrote:

[ ... ]
This is beyond the scope of comp.lang.c++, however, but you need to
somehow identify the sequences of bits. My guess is that you should
read more on the subject of compression and how compressed
information is saved to files. Imagine that once you written those
packed bits out to a file, you will want to be able to read and
unpack them in some way. So, you cannot simply stack those bits in
a large byte array, you need extra information placed there to help
with unpacking.

Huffman-compressed data does not require such extra information.

The bits in a Huffman-compressed stream represent decisions in a binary
tree. As you read bits in, you use them to traverse the tree. When you
reach a leaf node in the tree, you know you've reached the end of one
symbol. The next bit is the beginning of the next compressed symbol,
with it you restart traversing the tree from the root.
 
S

Sowen

Hi, Jerry

Do you mean the following:

repeat
read char from inFile
get code from tree
put code in array
if length of array > 8, write to file
remove the code that has been written from the array
until inFile.eof

so, I can use maybe "unsigned char code[8]" to store 1 byte, is it right?
if I am using ofstream, I can do

outFile << ios:binary << code

like that ?

sorry, I am a little confused in here

I didn't try the code yet, so I am not sure whether it works

--
Best Regards!
Sowen Cheung
http://com.angGoGo.com
http://www.angGoGo.com
http://biz.angGoGo.com
 
S

Sowen

ok, I found writing unsigned char array like that is wrong

could you tell me how to write 8 bits as 1 byte to the file?
thanks

--
Best Regards!
Sowen Cheung
http://com.angGoGo.com
http://www.angGoGo.com
http://biz.angGoGo.com
Sowen said:
Hi, Jerry

Do you mean the following:

repeat
read char from inFile
get code from tree
put code in array
if length of array > 8, write to file
remove the code that has been written from the array
until inFile.eof

so, I can use maybe "unsigned char code[8]" to store 1 byte, is it right?
if I am using ofstream, I can do

outFile << ios:binary << code

like that ?

sorry, I am a little confused in here

I didn't try the code yet, so I am not sure whether it works

--
Best Regards!
Sowen Cheung
http://com.angGoGo.com
http://www.angGoGo.com
http://biz.angGoGo.com
Jerry Coffin said:
Normally, you accumulate bits until you get at least one full byte, and
then write that to the stream. In reality, one byte is also a pretty
small amount -- it's usually better to accumulate a buffer full of
bytes before writing out the data.

--
Later,
Jerry.

The universe is a figment of its own imagination.
 
E

Evan

However, what Victor is saying is that you need to store that tree.

Which is half true. Yes, the decompressing algorithm needs to know the
tree, and storing the tree with the file leads to the best compression
of the actual data part. (The tree's overhead might make this untrue
for the whole.) However, there are two other choices. The first is to
use the same tree for every file, which could work well if it's always
compressing the same type of file (e.g. natural language could probably
be compressed with such a scheme almost as well as computing frequencys
and building a tree for every file). The second is to use an adaptive
Huffman algorithm, which starts at a known point and builds the tree as
the input flows in and is compressed.
 
J

Jerry Coffin

Sowen said:
Hi, Jerry

Do you mean the following:

repeat
read char from inFile
get code from tree
put code in array
if length of array > 8, write to file
remove the code that has been written from the array
until inFile.eof

Yes, that's the rough idea.
so, I can use maybe "unsigned char code[8]" to store 1 byte, is it right?
if I am using ofstream, I can do

outFile << ios:binary << code

like that ?

No -- first of all, ios::binary is used when you open a file. Second
two write out a byte, you want a single unsigned char, NOT an array of
8 of them -- that would hold 8 bytes.

So, assume your first character gives a code of 1101. You put that into
the first four bits of your byte, and then get the next code for the
next character. We'll say that gives a code of 10111 (five bits). You
put the first four of those bits into the same character as you already
have holding the bits of previous code. That gives you an 8-bit byte
that you can then write out to the file. You now have one bit as the
beginning of the next code, so you save it and read in the third input
character.

Despite its mostly deservedly being denigrated elsewhere, you might
find it convenient to use a vector<bool> to accumulate the bits before
you write them out to the file.
 
H

Howard

Jerry Coffin said:
Sowen said:
Hi, Jerry

Do you mean the following:

repeat
read char from inFile
get code from tree
put code in array
if length of array > 8, write to file
remove the code that has been written from the array
until inFile.eof

Yes, that's the rough idea.
so, I can use maybe "unsigned char code[8]" to store 1 byte, is it right?
if I am using ofstream, I can do

outFile << ios:binary << code

like that ?

No -- first of all, ios::binary is used when you open a file. Second
two write out a byte, you want a single unsigned char, NOT an array of
8 of them -- that would hold 8 bytes.

So, assume your first character gives a code of 1101. You put that into
the first four bits of your byte, and then get the next code for the
next character. We'll say that gives a code of 10111 (five bits). You
put the first four of those bits into the same character as you already
have holding the bits of previous code. That gives you an 8-bit byte
that you can then write out to the file. You now have one bit as the
beginning of the next code, so you save it and read in the third input
character.

From the original post, it looks like what's being returned in characters
'1' and '0, in some kind of string or character array. If that's the case,
then to put that information into a byte requires a little bit-twiddling.
You'll need some kind of index as to which bit in the byte is next to be
written into, and an "if" statement which sets that bit to 0 if '0' is the
next character to be packed and sets it to 1 if '1' is the next character to
be packed. (Actually, if you set the byte you're writing to to 0 each time
you start packing into it again, then you only need to set the 1 bits when
you encounter a '1'.) If you don't know how to set a bit to 1, ask again or
check out Google for some examples.

-Howard
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,581
Members
45,056
Latest member
GlycogenSupporthealth

Latest Threads

Top