Unpacking signed shorts and integers with specified endianness

Discussion in 'Ruby' started by Phrogz, Jun 18, 2007.

  1. Phrogz

    Phrogz Guest

    I'm deciphering a byte-based binary file structure which can be stored
    in either big or little endian. (The first byte of the file describes
    the format.) Among others, some of the fields are UINT32 (unsigned 4-
    byte integers), and some are INT32 (signed 4-byte integers).

    I know about BitStruct, but was trying to roll my own solution using
    String.unpack. (The file format has nested and repeating sections in
    it.) Given that I know exactly how many bytes I want for each field
    and whether the file is big or little endian, I thought it would be
    easy to pick a specific unpack character for each field. However,
    reading through the String.unpack docs, I can't find something that
    corresponds to signed 2-byte and 4-byte integers for a given
    endianness.

    Here's what I see (ASCII art ahead):
    | signed | unsigned |
    bytes | big | little | big | little |
    ------+-------+--------+-------+--------+
    1 | c | C |
    2 | ? | ? | n | v |
    4 | ? | ? | N | V |

    1) I see "l" (lowercase L) which is 4 bytes treated as a signed
    integer...but in 'native' endian order. Does String.unpack not provide
    a way to unpack a 4-byte signed integer with a specified endianness?

    2) I see "s" which is 2 bytes treated as a signed integer...but in
    'native' endian order. Does String.unpack not provide a way to unpack
    a 2-byte signed integer with a specified endianness?

    (The file format doesn't actually use any signed 2-byte integers, but
    I wanted to include them for completeness.)
     
    Phrogz, Jun 18, 2007
    #1
    1. Advertising

  2. Phrogz

    Phrogz Guest

    On Jun 18, 4:32 pm, Phrogz <> wrote:
    > 1) I see "l" (lowercase L) which is 4 bytes treated as a signed
    > integer...but in 'native' endian order. Does String.unpack not provide
    > a way to unpack a 4-byte signed integer with a specified endianness?
    >
    > 2) I see "s" which is 2 bytes treated as a signed integer...but in
    > 'native' endian order. Does String.unpack not provide a way to unpack
    > a 2-byte signed integer with a specified endianness?


    Alright, let's try asking this question a different way. Given a
    binary file spec like the following, where the first byte tells you
    whether to interpret the rest of the file as little- or big-endian,
    and given the presence of a signed integer in the file, how would you
    write a parser for this?

    endian UINT8 (1==little, 0==big)
    job_count UINT8 # of repeated binary sections following this
    (jobs) JOB*

    Each JOB section is:
    foo UINT8
    bar UINT16
    jim UINT32
    jam INT32
    jill UINT8

    gib_mark 0xdead # 2 bytes
    gib_count UINT16 # of repeated binary sections following this
    (gibs) GIB+

    Each GIB section is:
    gob 8 chars
    blurb UINT8


    Would you invent a new character for String.unpack, split the string
    around it, and use knowledge of the native endianness of the platform
    you're running on to decide whether to pull out those 4 bytes
    independently, reverse them, and then unpack the result as a signed
    integer?

    Is there some sweet trick you could do after extracting 4 bytes as an
    integer to switch the implied interpreted endianness?

    Would you patch String.unpack in C to add options for specific-endian
    signed shorts and integers?

    Can you easily do the above (including the repeating sub-binary
    sections) with BitStruct?
     
    Phrogz, Jun 19, 2007
    #2
    1. Advertising

  3. On Jun 18, 4:35 pm, Phrogz <> wrote:

    <snip>

    > 1) I see "l" (lowercase L) which is 4 bytes treated as a signed
    > integer...but in 'native' endian order. Does String.unpack not provide
    > a way to unpack a 4-byte signed integer with a specified endianness?
    >
    > 2) I see "s" which is 2 bytes treated as a signed integer...but in
    > 'native' endian order. Does String.unpack not provide a way to unpack
    > a 2-byte signed integer with a specified endianness?


    See the 'N', 'n', 'V' and 'v' directives. There are equivalent
    directives for floats as well - 'E', 'e', 'G' and 'g'.

    Regards,

    Dan
     
    Daniel Berger, Jun 19, 2007
    #3
  4. Phrogz

    Mark Day Guest

    On Jun 19, 2007, at 11:19 AM, Daniel Berger wrote:

    >> 1) I see "l" (lowercase L) which is 4 bytes treated as a signed
    >> integer...but in 'native' endian order. Does String.unpack not
    >> provide
    >> a way to unpack a 4-byte signed integer with a specified endianness?
    >>
    >> 2) I see "s" which is 2 bytes treated as a signed integer...but in
    >> 'native' endian order. Does String.unpack not provide a way to unpack
    >> a 2-byte signed integer with a specified endianness?

    >
    > See the 'N', 'n', 'V' and 'v' directives. There are equivalent
    > directives for floats as well - 'E', 'e', 'G' and 'g'.


    Those handle endianness, but not signed values. I suppose you could
    unpack as unsigned, then manually test for the sign bit being set and
    correct the value. Even uglier, you could unpack as unsigned with
    desired endianness, repack as unsigned in native order, then unpack as
    signed in native order.

    -Mark
     
    Mark Day, Jun 19, 2007
    #4
  5. Mark Day wrote:
    > On Jun 19, 2007, at 11:19 AM, Daniel Berger wrote:
    >
    >>> 1) I see "l" (lowercase L) which is 4 bytes treated as a signed
    >>> integer...but in 'native' endian order. Does String.unpack not provide
    >>> a way to unpack a 4-byte signed integer with a specified endianness?
    >>>
    >>> 2) I see "s" which is 2 bytes treated as a signed integer...but in
    >>> 'native' endian order. Does String.unpack not provide a way to unpack
    >>> a 2-byte signed integer with a specified endianness?

    >>
    >> See the 'N', 'n', 'V' and 'v' directives. There are equivalent
    >> directives for floats as well - 'E', 'e', 'G' and 'g'.

    >
    > Those handle endianness, but not signed values. I suppose you could
    > unpack as unsigned, then manually test for the sign bit being set and
    > correct the value. Even uglier, you could unpack as unsigned with
    > desired endianness, repack as unsigned in native order, then unpack as
    > signed in native order.


    What bit-struct does in these cases is the first kind of ugly:

    # Let's say we start with a negative number packed in
    # 16 bits, big-endian:
    x = -123
    s = [x].pack("n")

    # Note that the sign is not packed with the number. It packs to the
    # same chars as 2**16 + x

    bits = 16
    max_unsigned = 2 ** bits
    max_signed = 2 ** (bits - 1)
    to_signed = proc { |n| (n >= max_signed) ? n - max_unsigned : n }

    puts to_signed[s.unpack("n").first] # ==> -123

    (This has come up a few times on the list -- search for "to_signed", for
    example.)

    It's still a hack, though, and I'd like to see Gavin's RCR go through,
    if the naming issues can be resolved.

    --
    vjoel : Joel VanderWerf : path berkeley edu : 510 665 3407
     
    Joel VanderWerf, Jun 24, 2007
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. F. Janse Kok
    Replies:
    5
    Views:
    542
    Prateek R Karandikar
    Jun 14, 2004
  2. Tubular Technician

    Who likes short shorts?

    Tubular Technician, Feb 9, 2008, in forum: C Programming
    Replies:
    28
    Views:
    719
    Army1987
    Feb 15, 2008
  3. Hahnemann

    Representing a Sequence of Shorts in Bits

    Hahnemann, Jun 25, 2008, in forum: C Programming
    Replies:
    9
    Views:
    312
  4. gosee
    Replies:
    0
    Views:
    359
    gosee
    Jun 28, 2009
  5. Aaron D. Gifford
    Replies:
    3
    Views:
    187
    Aaron D. Gifford
    Apr 7, 2011
Loading...

Share This Page