unpack signed short in network (big-endian) byte order

baumanj · Mar 29, 2006

Is there any particular reason why there is no format specifier in the
String.unpack method for unpacking a signed short in network byte
order? There seems only to be one for unsigned, which can then be
converted to signed with a proc like this:

length = 16
max = 2**length-1
mid = 2**(length-1)
to_signed = proc {|n| (n >= mid) ? -((n ^ max) + 1) : n}

It seems reasonable to add a format specifier to do this directly.
Thoughts?

Joel VanderWerf · Mar 29, 2006

Is there any particular reason why there is no format specifier in the
String.unpack method for unpacking a signed short in network byte
order? There seems only to be one for unsigned, which can then be
converted to signed with a proc like this:

length = 16
max = 2**length-1
mid = 2**(length-1)
to_signed = proc {|n| (n >= mid) ? -((n ^ max) + 1) : n}

It seems reasonable to add a format specifier to do this directly.
Thoughts?

I'd vote for that. Looks like you already found this thread

http://blade.nagaokaut.ac.jp/cgi-bin/vframe.rb/ruby/ruby-talk/165774?165771-168359

baumanj · Mar 30, 2006

Perhaps this should be a new subject like "adding formats for
pack/unpack". I gave this a bit of thought, and there are quite a few
formats that seem to be missing. I would naturally assume that there
should be support for each combination of

{unsigned, signed} x {native, network, little-endian} x {short, int,
long}

However, of these 18 formats, 8 are missing. There are more than enough
unused characters to specify them (j J k K o O r R t T W y Y z). I
tried to come up with consistent mappings based on what already exists
and this is what I came up with (?s indicate that the format is not
currently supported):

S unsigned native short
s signed native short
n unsigned network short
o ? signed network short
v unsigned little-endian short
r ? signed little-endian short

I unsigned native int
i signed native int
J ? unsigned network int
j ? signed network int
K ? unsigned little-endian int
k ? signed little-endian int

L unsigned native long
l signed native long
N unsigned network long
y ? signed network long
V unsigned little-endian long
y ? signed little-endian long

I'm thinking of submitting an RCR for this, but before I go through the
effort of figuring out the c code to achieve it, I wanted some opinions
as to whether this is worthwhile and whether the mappings are sensible.

Thanks.

Eric Hodel · Mar 30, 2006

Perhaps this should be a new subject like "adding formats for
pack/unpack". I gave this a bit of thought, and there are quite a few
formats that seem to be missing. I would naturally assume that there
should be support for each combination of

{unsigned, signed} x {native, network, little-endian} x {short, int,
long}

However, of these 18 formats, 8 are missing. There are more than
enough
unused characters to specify them (j J k K o O r R t T W y Y z). I
tried to come up with consistent mappings based on what already exists
and this is what I came up with (?s indicate that the format is not
currently supported):
[...]

I'm thinking of submitting an RCR for this,

Don't bother with an RCR, just submit a patch to ruby-core.

but before I go through the
effort of figuring out the c code to achieve it, I wanted some
opinions
as to whether this is worthwhile and whether the mappings are
sensible.

What does perl or python do for the same types?

ara.t.howard · Mar 30, 2006

Perhaps this should be a new subject like "adding formats for
pack/unpack". I gave this a bit of thought, and there are quite a few
formats that seem to be missing. I would naturally assume that there
should be support for each combination of

{unsigned, signed} x {native, network, little-endian} x {short, int,
long}

However, of these 18 formats, 8 are missing. There are more than enough
unused characters to specify them (j J k K o O r R t T W y Y z). I
tried to come up with consistent mappings based on what already exists
and this is what I came up with (?s indicate that the format is not
currently supported):

S unsigned native short
s signed native short
n unsigned network short
o ? signed network short
v unsigned little-endian short
r ? signed little-endian short

I unsigned native int
i signed native int
J ? unsigned network int
j ? signed network int
K ? unsigned little-endian int
k ? signed little-endian int

L unsigned native long
l signed native long
N unsigned network long
y ? signed network long
V unsigned little-endian long
y ? signed little-endian long

I'm thinking of submitting an RCR for this, but before I go through the
effort of figuring out the c code to achieve it, I wanted some opinions
as to whether this is worthwhile and whether the mappings are sensible.

Thanks.

i use pack/unpack alot and would love to see more types handled. i'm with
eric on checking perl/python. however, the reality is that pack/unpack are so
dang cryptic it hardly matters - i always need to read the docs ;-) that said
they are extremely useful.

good luck.

-a

baumanj · Mar 31, 2006

Perl's pack options seem to be largely the same. It has the same holes
that ruby does. One interesting wrinkle is that where ruby uses the
underscore ("_") suffix to enforce native lengths, perl uses a bang
("!"). I don't know why that's different.

http://www.xav.com/perl/lib/Pod/perlfunc.html#item_pack

Python has a rather different approach. There are only a few specifiers
but integer length and byte-order are defined by a modifier at the
front of the format string.

http://docs.python.org/lib/module-struct.html

I sort of like the Python idea, because then we wouldn't have to
introduce more cryptic format specifiers. However, it eliminates the
(potentially unlikely) ability to have different byte-ordering for
different values within the array being packed. Perhaps the best
solution is to allow the modifier at the beginning to establish a
default, but allow modifiers after specifiers as well in order to
override the default. On the other hand, maybe it would be best to
stick to one or the other approach. The former is simpler and probably
covers the common usage, but the latter is more similar to the way the
function currently works.

I don't know. Opinions?

Eric said:
Perhaps this should be a new subject like "adding formats for
pack/unpack". I gave this a bit of thought, and there are quite a few
formats that seem to be missing. I would naturally assume that there
should be support for each combination of

{unsigned, signed} x {native, network, little-endian} x {short, int,
long}

However, of these 18 formats, 8 are missing. There are more than
enough
unused characters to specify them (j J k K o O r R t T W y Y z). I
tried to come up with consistent mappings based on what already exists
and this is what I came up with (?s indicate that the format is not
currently supported):
[...]

I'm thinking of submitting an RCR for this,

Click to expand...

Don't bother with an RCR, just submit a patch to ruby-core.

but before I go through the
effort of figuring out the c code to achieve it, I wanted some
opinions
as to whether this is worthwhile and whether the mappings are
sensible.

Click to expand...

What does perl or python do for the same types?

--
Eric Hodel - (e-mail address removed) - http://blog.segment7.net
This implementation is HODEL-HASH-9600 compliant

http://trackmap.robotcoop.com

Yukihiro Matsumoto · Mar 31, 2006

Hi,

In message "Re: unpack signed short in network (big-endian) byte order"
|
|Perl's pack options seem to be largely the same. It has the same holes
|that ruby does. One interesting wrinkle is that where ruby uses the
|underscore ("_") suffix to enforce native lengths, perl uses a bang
|("!"). I don't know why that's different.

Since Perl used to use underscore when I implemented it. Besides Ruby
allows "!" as well.

matz.

baumanj · Mar 31, 2006

Yukihiro said:
|Perl's pack options seem to be largely the same. It has the same holes
|that ruby does. One interesting wrinkle is that where ruby uses the
|underscore ("_") suffix to enforce native lengths, perl uses a bang
|("!"). I don't know why that's different.

Since Perl used to use underscore when I implemented it. Besides Ruby
allows "!" as well.

Oh? I don't think that's mentioned in the rdoc for pack/unpack. In any
case, do you have any thoughts about adding these other formats?

baumanj · Mar 31, 2006

Oops, 'y' can't be both signed network long and signed little-endian
long. How about this:

S unsigned native short
s signed native short
n unsigned network short
o ? signed network short
v unsigned little-endian short
r ? signed little-endian short

I unsigned native int
i signed native int
J ? unsigned network int
j ? signed network int
K ? unsigned little-endian int
k ? signed little-endian int

L unsigned native long
l signed native long
N unsigned network long
O ? signed network long
V unsigned little-endian long
R ? signed little-endian long

Yukihiro Matsumoto · Apr 2, 2006

Hi,

In message "Re: unpack signed short in network (big-endian) byte order"

|However, of these 18 formats, 8 are missing. There are more than enough
|unused characters to specify them (j J k K o O r R t T W y Y z). I
|tried to come up with consistent mappings based on what already exists
|and this is what I came up with (?s indicate that the format is not
|currently supported):

The pack templates are derived from Perl, and I don't want to add our
own templates. I just talked with Larry Wall who gave presentation at
YAPC::Asia Tokyo, and he said he was not sure how to deal pack
templates in Perl6 yet. Maybe we can meet agreement to add new pack
templates, or we can think of other feature to pack/unpack binary
data, such as cstruct module in Python.

matz.

Kero · Apr 8, 2006

Perhaps this should be a new subject like "adding formats for
pack/unpack". I gave this a bit of thought, and there are quite a few
formats that seem to be missing. I would naturally assume that there
should be support for each combination of

{unsigned, signed} x {native, network, little-endian} x {short, int,
long}

8/16/32/64 bits int [in C, native short can be 16 bits, need not be]
16(?)/32/64 bits float [signedness irrelevant]
ascii chars, other string encodings(?) [signedness and endianness irrelevant?]
Then there are Endian formats that you wouldn't create in your nightmares.

However, is there any point for "native short" to allow for different
endianness? It's no longer the native short...

A problem is that single characters do not scale, you'll run out of the
alphabet eventually.

clear(?) but absolutely-not-concise suggestion:
"stuf".unpack(Integer, :32bits, :signed,

bscure_endianness)
file.read(Native::sizeof

int) * 3).unpack

int, 3)

Perhaps you can change those modifiers to something like "-32oI" and "i3"
but then I fear backwards compatibility issues.

Bye,
Kero.

baumanj · Apr 10, 2006

Kero said:
Perhaps this should be a new subject like "adding formats for
pack/unpack". I gave this a bit of thought, and there are quite a few
formats that seem to be missing. I would naturally assume that there
should be support for each combination of

{unsigned, signed} x {native, network, little-endian} x {short, int,
long}

Click to expand...

8/16/32/64 bits int [in C, native short can be 16 bits, need not be]
16(?)/32/64 bits float [signedness irrelevant]
ascii chars, other string encodings(?) [signedness and endianness irrelevant?]
Then there are Endian formats that you wouldn't create in your nightmares.

However, is there any point for "native short" to allow for different
endianness? It's no longer the native short...

I'm not talking about allowing different endianness for native. Native
is one of the endianness options: it varies depending upon the byte
ordering of the underlying system.

A problem is that single characters do not scale, you'll run out of the
alphabet eventually.

We'll just start using kana in i-ro-ha order once we run out of latin
letters.

(joke)

clear(?) but absolutely-not-concise suggestion:
"stuf".unpack(Integer, :32bits, :signed, bscure_endianness)
file.read(Native::sizeofint) * 3).unpackint, 3)

I wouldn't be opposed to adding a clone of the python struct class
which approaches this a bit differently (see the link in my previous
message), but I think we should keep this as a separate option for
compatibility with old programs

I've actually implemented my suggestion and submitted a patch to matz.
I'm just waiting to hear back. If anyone else would like to see the
patch and give feedback, let me know.

How to unpack an IP packed in little endian byte order	4	Dec 17, 2009
Reading a signed byte in network byte order	7	Nov 14, 2005
Unpacking signed shorts and integers with specified endianness	4	Jun 18, 2007
How to get offset position from unpack()?	1	Feb 15, 2010
Data saving in condition of changing reality	0	Apr 29, 2022
Reading little-endian data from a file in a portable manner	46	Jul 16, 2010
Packing and unpacking unsigned integers of arbitrary size as binarystrings	3	Apr 7, 2011
converting floating point number to byte array in C	8	Aug 18, 2008

unpack signed short in network (big-endian) byte order

baumanj

Joel VanderWerf

baumanj

Eric Hodel

ara.t.howard

baumanj

Yukihiro Matsumoto

baumanj

baumanj

Yukihiro Matsumoto

Kero

baumanj

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads