Writing single bits to a file

Discussion in 'C Programming' started by riva, Oct 26, 2007.

  1. riva

    riva Guest

    I am developing a compression program. Is there any way to write a
    data to file in the form of bits, like write bit 0 then bit 1 and then
    bit 1 and so on ....
    riva, Oct 26, 2007
    #1
    1. Advertising

  2. riva

    Eric Sosman Guest

    riva wrote On 10/26/07 16:00,:
    > I am developing a compression program. Is there any way to write a
    > data to file in the form of bits, like write bit 0 then bit 1 and then
    > bit 1 and so on ....


    C's I/O streams use characters, nothing smaller.
    You should accumulate bits until you have a full
    character's worth, then write the character. At the
    end, you may need to provide a few extra bits to fill
    up the final character.

    --
    Eric Sosman, Oct 26, 2007
    #2
    1. Advertising

  3. riva

    riva Guest

    Well, what if I have just 19 bytes to write that means 2 chars and 3
    bits!
    riva, Oct 26, 2007
    #3
  4. riva

    Guest

    On Oct 26, 3:42 pm, riva <> wrote:
    > Well, what if I have just 19 bytes to write that means 2 chars and 3
    > bits!


    Ah, no. That means 19 chars and 0 bits. But what's your point? I
    create a 1 byte file on my computer and it take 4,096 bytes on my hard
    disk (plus room for file name, creation date, etc.)
    , Oct 26, 2007
    #4
  5. In article <>,
    riva <> wrote:
    >Well, what if I have just 19 bytes to write that means 2 chars and 3
    >bits!


    This is really an algorithm matter rather than a C matter, as you
    are unlikely to find very many systems at all that allow you to
    end files on arbitrary bit boundaries, no matter which language you use.


    In the "alphabet" known to the compressor, include an artificial
    "end of text" token. When you get to the end of the input source,
    encode that token; then follow it with whatever amount of junk
    bits you need to reach a character boundary. Your decompressor
    will know not to examine the junk bits because it will decode the
    end-of-text token and know not to go further.

    In practice, you will find that it is often useful in a compression
    routine to have a "flush dictionary" token with the same property
    of "ignore bits until the next octet boundary". Instead of having
    a specific "end of text" token, you can use the flush-dictionary
    token and then end the file.

    The flush-dictionary token allows you to adapt to changing conditions
    in the source data; you could imagine, for example, that the
    statistics useful for compressing the introduction to a document
    might be quite poor for compressing the technical material in the
    main document itself. Also, a flush-dictionary token has the
    useful property of allowing you to append the compressed versions
    of several files together, and have the result decompress to the
    original files appended together -- useful for building an archive
    for example.
    --
    "Any sufficiently advanced bug is indistinguishable from a feature."
    -- Rich Kulawiec
    Walter Roberson, Oct 26, 2007
    #5
  6. riva

    Eric Sosman Guest

    (When you reply to a Usenet post, please quote enough
    context to make your reply meaningful as a stand-alone
    message. People don't always follow a discussion serially
    from its start, and someone whose first encounter with a
    thread is something like -- well, like what you've written
    below -- will have a hard time discovering what the topic
    is, and are likely to ignore it. Thus, you risk depriving
    yourself of their expertise. For those just tuning in, riva
    asked how to write individual bits to a stream.)

    riva wrote On 10/26/07 16:42,:
    > Well, what if I have just 19 bytes to write that means 2 chars and 3
    > bits!


    I think you mean "bits," not "bytes." And that's why
    I wrote

    >> [...] At the
    >> end, you may need to provide a few extra bits to fill
    >> up the final character.


    .... in the part you didn't quote. Is there something in
    my sentence that you find confusing?

    --
    Eric Sosman, Oct 26, 2007
    #6
  7. riva

    cr88192 Guest

    "Eric Sosman" <> wrote in message
    news:1193429173.343059@news1nwk...
    > riva wrote On 10/26/07 16:00,:
    >> I am developing a compression program. Is there any way to write a
    >> data to file in the form of bits, like write bit 0 then bit 1 and then
    >> bit 1 and so on ....

    >
    > C's I/O streams use characters, nothing smaller.
    > You should accumulate bits until you have a full
    > character's worth, then write the character. At the
    > end, you may need to provide a few extra bits to fill
    > up the final character.
    >


    yes. an example here is to maintain an integer (say, 32 bits), and then a
    count of how many bits (left or right, depending on bit ordering) have been
    written to this word. adding new bits consists of shifting and or'ing.

    whenever the count is >=8, you can write a byte (say, with fputc or
    whatever...).
    after this is done, you shift the value (left or right), and subtract 8 from
    the count.

    if one can figure it out, it is also possible to write multi-bit values
    (with a 32 bit word, up to 24 bits at a time is safe).

    usually then, at the end, one will flush any bits remaining in the word to
    the file.


    oddly enough, bit ordering also naturally effects the word endianness, where
    going from high to low (or left to right) results in big-endian ordering,
    and going from low to high (or right to left) results in little-endian.

    compression people have seemingly never really come into agreement over this
    issue though, which is why many compressors do it one way, and others the
    other...


    > --
    >
    cr88192, Oct 26, 2007
    #7
  8. riva

    Don Bruder Guest

    In article <>,
    riva <> wrote:

    > Well, what if I have just 19 bytes to write that means 2 chars and 3
    > bits!


    (Assuming you actually meant 19 BITS, rather than 19 BYTES...)

    Then stuff the three leftover bits into a third byte, pad that byte with
    5 bits of your choice (personally, I'd go with zeroes, but that's just
    me) and write it.

    Considering that every currently-in-use OS uses the concept of "blocks",
    "chunks", or "clusters" as the smallest possible unit for disk I/O, with
    each of them being made up of some multiple of 128 bytes, it's
    incredibly unlikely that "saving" 5 bits is going to be meaningful in
    any realistic situation.

    --
    Don Bruder - - If your "From:" address isn't on my whitelist,
    or the subject of the message doesn't contain the exact text "PopperAndShadow"
    somewhere, any message sent to this address will go in the garbage without my
    ever knowing it arrived. Sorry... <http://www.sonic.net/~dakidd> for more info
    Don Bruder, Oct 26, 2007
    #8
  9. riva

    Joe Wright Guest

    riva wrote:
    > Well, what if I have just 19 bytes to write that means 2 chars and 3
    > bits!
    >


    I assume you mean 19 bits. You will have to write 3 bytes (24 bits) to
    get your 19 onto the stream.

    --
    Joe Wright
    "Everything should be made as simple as possible, but not simpler."
    --- Albert Einstein ---
    Joe Wright, Oct 26, 2007
    #9
  10. Don Bruder <> writes:
    > In article <>,
    > riva <> wrote:
    >
    >> Well, what if I have just 19 bytes to write that means 2 chars and 3
    >> bits!

    >
    > (Assuming you actually meant 19 BITS, rather than 19 BYTES...)
    >
    > Then stuff the three leftover bits into a third byte, pad that byte with
    > 5 bits of your choice (personally, I'd go with zeroes, but that's just
    > me) and write it.
    >
    > Considering that every currently-in-use OS uses the concept of "blocks",
    > "chunks", or "clusters" as the smallest possible unit for disk I/O, with
    > each of them being made up of some multiple of 128 bytes, it's
    > incredibly unlikely that "saving" 5 bits is going to be meaningful in
    > any realistic situation.


    Sure, but it's not just a matter of saving space. The size of a file
    (i.e., the number of bytes that have been written to it) can be an
    important piece of information.

    Under many operating systems, the size of a file is *recorded* as an
    exact number of bytes, even if the physical size on disk is a multiple
    of some larger block size. (However, the C standard doesn't guarantee
    this; binary files can be padded at the end with null characters.)

    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
    "We must do something. This is something. Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
    Keith Thompson, Oct 27, 2007
    #10
  11. riva

    Tor Rustad Guest

    riva wrote:
    > I am developing a compression program. Is there any way to write a
    > data to file in the form of bits, like write bit 0 then bit 1 and then
    > bit 1 and so on ....


    The way this can be done in C compression SW, is if you provide your own
    bit level I/O functions. However, in standard C those I/O functions need
    to commit no less than chunks of size CHAR_BIT to a file, which is at
    least an 8 bit object.

    Standard practice is to pad the last character, just like we do in
    communication or encryption SW. For details, look up a data compression
    book, or ask in a data compression news group.

    --
    Tor <torust [at] online [dot] no>

    "Technical skill is mastery of complexity, while creativity is mastery
    of simplicity"
    Tor Rustad, Oct 27, 2007
    #11
  12. riva

    Eric Sosman Guest

    Keith Thompson wrote:
    > [...]
    > Sure, but it's not just a matter of saving space. The size of a file
    > (i.e., the number of bytes that have been written to it) can be an
    > important piece of information.


    That's really not much help for the O.P.'s situation:
    He's working on some kind of compression program, which can
    generate a stream of output bits that doesn't "chunk" neatly
    into an integral number of bytes. No file system I've ever
    heard of can record a file length of 10007 bits. And it
    gets worse: Some of the best compressors encode their output
    in fractions of bits; if a file length of 10007 bits is bad,
    a length of 10006.59029663+ bits is *really* bad!

    See Walter Roberson's response.

    --
    Eric Sosman
    lid
    Eric Sosman, Oct 27, 2007
    #12
  13. "Don Bruder" <> wrote in message
    > Considering that every currently-in-use OS uses the concept of "blocks",
    > "chunks", or "clusters" as the smallest possible unit for disk I/O, with
    > each of them being made up of some multiple of 128 bytes, it's
    > incredibly unlikely that "saving" 5 bits is going to be meaningful in
    > any realistic situation.
    >

    Create thousands of such files, then tar them. See if you can detect a
    difference between ten thousand 128 byte files and ten thousand 18-byte
    files.

    --
    Free games and programming goodies.
    http://www.personal.leeds.ac.uk/~bgy1mm
    Malcolm McLean, Oct 27, 2007
    #13
  14. riva

    Martin Wells Guest

    riva:

    > I am developing a compression program. Is there any way to write a
    > data to file in the form of bits, like write bit 0 then bit 1 and then
    > bit 1 and so on ....



    Maybe something like: (Unchecked code)

    typedef struct BitWriterFileInfo {
    FILE *pfile;
    unsigned cur_byte_val,
    i_bit;
    } BitWriterFileInfo;

    void InitBitWriterFileInfo(BitWriterFileInfo *const p, FILE *const
    pfile)
    {
    p->pfile = pfile;
    p->cur_byte_val = 0;
    p->i_bit = 0;
    }


    void WriteBit(BitWriterFileInfo *const pout,int const val)
    {
    if (val)
    {
    p->cur_byte_val |= 1u << p->i_bit;
    }

    if (CHAR_BIT == ++(p->i_bit))
    {
    P->i_bit = 0;
    P->cur_byte_val = 0;

    fprintf(pout->pfile....
    }
    }

    You'd need a finaliser routine aswell if (bits_written % CHAR_BIT) is
    anything other than zero.

    Object-orientated programming would fit here very well by the way.

    Martin
    Martin Wells, Oct 27, 2007
    #14
  15. "Martin Wells" <> a écrit dans le message de news:
    ...
    > riva:
    >
    >> I am developing a compression program. Is there any way to write a
    >> data to file in the form of bits, like write bit 0 then bit 1 and then
    >> bit 1 and so on ....

    >
    >
    > Maybe something like: (Unchecked code)


    Unchecked indeed ;-)

    > typedef struct BitWriterFileInfo {
    > FILE *pfile;
    > unsigned cur_byte_val,
    > i_bit;
    > } BitWriterFileInfo;
    >
    > void InitBitWriterFileInfo(BitWriterFileInfo *const p, FILE *const
    > pfile)
    > {
    > p->pfile = pfile;
    > p->cur_byte_val = 0;
    > p->i_bit = 0;
    > }
    >
    >
    > void WriteBit(BitWriterFileInfo *const pout,int const val)
    > {
    > if (val)
    > {
    > p->cur_byte_val |= 1u << p->i_bit;


    p should be pout

    > }
    >
    > if (CHAR_BIT == ++(p->i_bit))
    > {
    > P->i_bit = 0;
    > P->cur_byte_val = 0;


    P should be pout too ;-)

    You should write the byte to the stream before clearing it!

    > fprintf(pout->pfile....


    putc(pout->cur_byte_val, pout->pfile);
    pout->cur_byte_val = pout->i_bit = 0;

    > }
    > }


    > You'd need a finaliser routine aswell if (bits_written % CHAR_BIT) is
    > anything other than zero.


    Yes.

    > Object-orientated programming would fit here very well by the way.


    And getting rid of these extra const qualifiers on the function parameters
    would improve readability, making parameter name mixup more obvious. A good
    illustration of my remarks elsethread.

    --
    Chqrlie.
    Charlie Gordon, Oct 27, 2007
    #15
  16. riva

    Martin Wells Guest

    Chqrlie:

    > And getting rid of these extra const qualifiers on the function parameters
    > would improve readability, making parameter name mixup more obvious.



    To be honest I'll use const wherever I can, unless it's stupid (and by
    stupid I mean makes no difference whatsoever, or that the difference
    has no benefits). An example of such a "stupid" case would be casting
    to a const type, reason being that it doesn't make a difference
    whether an R-value is const or not because you can't modify it anyway.
    If I take a pointer as an argument to a function, and if I don't
    intend on changing the pointer at all, then it makes sense to me to
    declare it as const. Also, I don't find that it detracts form
    readibility or that it adds confusion, so I suppose we just have
    different opinions on this.

    Just an aside. . . in C, the whole BitWriterFileInfo thing would be
    implemented as follows:

    int main(void)
    {
    BitWriterFileInfo obj;

    InitBigWriterFileInfo(&obj, stdout);

    /* Use it */

    FinaliseBigWriterFileInfo(&obj);

    return 0;
    }

    , whereas in something like C++, it would be:

    int main()
    {
    BigFileWriter obj;

    /* Use it */
    }

    Usually, I'm mad for the ol' procedural programming and I use it 9
    times out of 10, but this is definitely one of the places where I'd
    opt for the object orientated.

    Martin
    Martin Wells, Oct 27, 2007
    #16
  17. "Martin Wells" <> a écrit dans le message de news:
    ...
    > Chqrlie:
    >
    >> And getting rid of these extra const qualifiers on the function
    >> parameters
    >> would improve readability, making parameter name mixup more obvious.

    >
    > To be honest I'll use const wherever I can, unless it's stupid (and by
    > stupid I mean makes no difference whatsoever, or that the difference
    > has no benefits). An example of such a "stupid" case would be casting
    > to a const type, reason being that it doesn't make a difference
    > whether an R-value is const or not because you can't modify it anyway.
    > If I take a pointer as an argument to a function, and if I don't
    > intend on changing the pointer at all, then it makes sense to me to
    > declare it as const. Also, I don't find that it detracts form
    > readibility or that it adds confusion, so I suppose we just have
    > different opinions on this.


    Given your choice of field names, you shouldn't be too proud of your coding
    conventions, stylistic or otherwise. We do disagree on what you consider
    'stupid' as well. IMHO, if the need arises to cast a qualified pointer to a
    different type in an expression, needlessly unqualifying it at the same time
    *is* stupid.

    const qualifying the parameters of a 3 line function is completely useless.
    Just like casting strcpy to (void) and casting the result of malloc().

    --
    Chqrlie.
    Charlie Gordon, Oct 27, 2007
    #17
  18. riva

    Martin Wells Guest

    Chqrlie:

    > Given your choice of field names, you shouldn't be too proud of your coding
    > conventions, stylistic or otherwise.



    By "field names", do you mean the names I choose for functions and
    variables? What in particular is wrong with them? Please take one of
    them as an example and point out the (perceived) flaws to me.


    > We do disagree on what you consider
    > 'stupid' as well. IMHO, if the need arises to cast a qualified pointer to a
    > different type in an expression, needlessly unqualifying it at the same time
    > *is* stupid.



    That's not what I mean. I meant the likes of:

    int i;
    double x;

    ...

    i = (int const)x;

    Writint "int const" instead of "int" has no merit. (Let's assume that
    the cast was present to suppress a compiler warning). But even if we
    take a pointer type, we could have:

    int i;

    char *p = (char*)&i;
    char *p = (char const *)&i; /* This is stupid */


    > const qualifying the parameters of a 3 line function is completely useless.
    > Just like casting strcpy to (void) and casting the result of malloc().



    I disagree. The merit is that you'll get a compiler error if you try
    to alter something, which, you didn't intend to alter in the first
    place.

    The be-all and end-all of it though is that I do be consistent with
    const -- i.e. if I'm don't plan on changing it, then make it const.

    Martin
    Martin Wells, Oct 27, 2007
    #18
  19. riva

    cr88192 Guest

    "Charlie Gordon" <> wrote in message
    news:4723a669$0$11684$...
    > "Martin Wells" <> a écrit dans le message de news:
    > ...
    >> Chqrlie:
    >>
    >>> And getting rid of these extra const qualifiers on the function
    >>> parameters
    >>> would improve readability, making parameter name mixup more obvious.

    >>
    >> To be honest I'll use const wherever I can, unless it's stupid (and by
    >> stupid I mean makes no difference whatsoever, or that the difference
    >> has no benefits). An example of such a "stupid" case would be casting
    >> to a const type, reason being that it doesn't make a difference
    >> whether an R-value is const or not because you can't modify it anyway.
    >> If I take a pointer as an argument to a function, and if I don't
    >> intend on changing the pointer at all, then it makes sense to me to
    >> declare it as const. Also, I don't find that it detracts form
    >> readibility or that it adds confusion, so I suppose we just have
    >> different opinions on this.

    >
    > Given your choice of field names, you shouldn't be too proud of your
    > coding conventions, stylistic or otherwise. We do disagree on what you
    > consider 'stupid' as well. IMHO, if the need arises to cast a qualified
    > pointer to a different type in an expression, needlessly unqualifying it
    > at the same time *is* stupid.
    >


    'const' is a keyword I don't really know if I have ever really used...

    reason: it doesn't really do anything, beyond telling the compiler to
    complain to me about stuff I should sanely know already anyways...

    now, of course, there are cases where I do want the compiler to complain:
    missing prototypes.

    reason: this leads to very real and very messy results.
    it also helps me maintain general strict modularity conventions (I can't
    call any code I did not include a header for, or for detecting functions
    which have ceased to exist, which is useful when reorganizing my codebase).
    ....

    note that I make this tolerable, mostly because I tend to use a special tool
    to 'autoheader' my source code (though, in my case, in a few places this
    does lead to special comments telling the tool to ignore certain functions,
    ....).


    > const qualifying the parameters of a 3 line function is completely
    > useless. Just like casting strcpy to (void) and casting the result of
    > malloc().
    >


    I will disagree with the casting the result of malloc.

    main reason:
    it is necessary if one wants their code to compile in both C and C++
    compilers, since C++ usually raises a big fuss about uncast conversions
    to/from 'void *'...


    > --
    > Chqrlie.
    >
    cr88192, Oct 27, 2007
    #19
  20. riva

    CBFalconer Guest

    cr88192 wrote:
    >

    .... snip ...
    >
    > I will disagree with the casting the result of malloc.
    >
    > main reason:
    > it is necessary if one wants their code to compile in both C and
    > C++ compilers, since C++ usually raises a big fuss about uncast
    > conversions to/from 'void *'...


    If you write code and try to routinely compile it with both C and
    C++ compilers you deserve whatever happens to you. The languages
    are _different_. The multiple compilers is not normally useful
    anyhow, since you can always compile C in such a manner as to be
    linkable to C++, but not the reverse.

    --
    Chuck F (cbfalconer at maineline dot net)
    Available for consulting/temporary embedded and systems.
    <http://cbfalconer.home.att.net>



    --
    Posted via a free Usenet account from http://www.teranews.com
    CBFalconer, Oct 28, 2007
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. GGG
    Replies:
    10
    Views:
    12,515
    Donar
    Jul 6, 2006
  2. sarmin kho
    Replies:
    2
    Views:
    818
    A. Lloyd Flanagan
    Jun 15, 2004
  3. Miki Tebeka
    Replies:
    1
    Views:
    433
    Marcin 'Qrczak' Kowalczyk
    Jun 14, 2004
  4. sergey

    "casting" bits to bits?

    sergey, Nov 8, 2006, in forum: VHDL
    Replies:
    1
    Views:
    691
    sergey
    Nov 8, 2006
  5. riva
    Replies:
    2
    Views:
    371
    Roedy Green
    Oct 27, 2007
Loading...

Share This Page