String#unpack and null-terminated strings

Michael Neumann · Apr 24, 2004

Hi,

How can I unpack two or more consecutive C-strings with the
String#unpack method? Like this:

"abc\000def\000".unpack("??") # => ["abc", "def"]

Currently, this seems not to be possible. Any chance to get the
following patch applied, which implements exactly this?

Regards,

Michael

Index: pack.c
===================================================================
RCS file: /src/ruby/pack.c,v
retrieving revision 1.69
diff -r1.69 pack.c
1287a1288,1290

Mike Stok · Apr 24, 2004

Michael Neumann said:
Hi,

How can I unpack two or more consecutive C-strings with the
String#unpack method? Like this:

"abc\000def\000".unpack("??") # => ["abc", "def"]

Currently, this seems not to be possible. Any chance to get the
following patch applied, which implements exactly this?

You could use String#split e.g.

irb(main):001:0> "abc\000def\000".split(/\0/)
=> ["abc", "def"]

I know it's not String#unpack, but hope it helps.

Mike

Michael Neumann · Apr 25, 2004

Michael Neumann said:
Michael Neumann said:

Hi,

How can I unpack two or more consecutive C-strings with the
String#unpack method? Like this:

"abc\000def\000".unpack("??") # => ["abc", "def"]

Currently, this seems not to be possible. Any chance to get the
following patch applied, which implements exactly this?

Click to expand...

You could use String#split e.g.

irb(main):001:0> "abc\000def\000".split(/\0/)
=> ["abc", "def"]

Sure this works. But I want to mix it with other data-types like:

"\100String\000\100".unpack("CTC") # T=null-term string

# => [64, "String", 64]

Otherwise I have to write:

str = "\100String\000\100"
a, str = str.unpack("Ca*")
b, str = str.split("\000", 2)
c, _ = str.unpack("Ca*")

p [a, b, c] # => [64, "String", 64]

Which is a bit ugly

Pyhtons struct.unpack has a "s" format specifier which does exactly what
I want. Perl and Ruby doesn't have this.

http://www.python.org/doc/current/lib/module-struct.html

Regards,

Michael

Daniel Berger · Apr 25, 2004

Michael Neumann said:
Hi,

How can I unpack two or more consecutive C-strings with the
String#unpack method? Like this:

"abc\000def\000".unpack("??") # => ["abc", "def"]

Currently, this seems not to be possible. Any chance to get the
following patch applied, which implements exactly this?

Regards,

Michael

"abc\000def\000".unpack("A3xA3") # => ["abc","def"]

Using the example you later posted...

"\100String\000\100".unpack("CA6xC") # => [64,"String",64]

Regards,

Dan

daz · Apr 25, 2004

Michael said:
Hi,

How can I unpack two or more consecutive C-strings with the
String#unpack method? Like this:

"abc\000def\000".unpack("??") # => ["abc", "def"]

Currently, this seems not to be possible. Any chance to get the
following patch applied, which implements exactly this?

[snip] diff -r1.69 pack.c

At the risk of being told to clear off and write my own spec.,
I think that an ambuiguity has intruded into the designers mind.

The A and Z string field formats should IMO be recovered from
left to right. Doesn't the term "string" relate here to a
string element within a packed field. The packed field just
happens to be a Ruby String.

If this is going to break code, I wish that it could happen
from 1.9.
As it is now, A and Z are behaving the way I would expect
A* and Z* to (i.e. * uses all remaining elements).

There's String#rstrip for removing spaces and nulls from the
end of a String.

Unpack is very useful for decoding structures but with the
current behaviour if a structure were to contain a null-
terminated string element it would break the flow ...
.... as Michael has highlighted.

Please, Matz.

daz

nobu.nokada · Apr 25, 2004

Hi,

At Sun, 25 Apr 2004 14:34:03 +0900,
daz wrote in [ruby-talk:98298]:

The A and Z string field formats should IMO be recovered from
left to right. Doesn't the term "string" relate here to a
string element within a packed field. The packed field just
happens to be a Ruby String.

Sounds nice.

Index: pack.c
===================================================================
RCS file: /cvs/ruby/src/ruby/pack.c,v
retrieving revision 1.69
diff -u -2 -p -r1.69 pack.c
--- pack.c 18 Apr 2004 23:19:45 -0000 1.69
+++ pack.c 25 Apr 2004 06:39:33 -0000
@@ -435,5 +435,5 @@ static unsigned long utf8_to_uv _((char*
* X | Back up a byte
* x | Null byte
- * Z | Same as ``A''
+ * Z | Same as ``a'', except that null is added with *
*/

@@ -524,6 +524,9 @@ pack_pack(ary, fmt)
case 'A': /* ASCII string (space padded) */
case 'Z': /* null terminated ASCII string */
- if (plen >= len)
+ if (plen >= len) {
rb_str_buf_cat(res, ptr, len);
+ if (p[-1] == '*' && type == 'Z')
+ rb_str_buf_cat(res, nul10, 1);
+ }
else {
rb_str_buf_cat(res, ptr, plen);
@@ -1174,4 +1177,5 @@ infected_str_new(ptr, len, str)
* "abc \0\0abc \0\0".unpack('A6Z6') #=> ["abc", "abc "]
* "abc \0\0".unpack('a3a3') #=> ["abc", " \000\000"]
+ * "abc \0abc \0".unpack('Z*Z*') #=> ["abc ", "abc "]
* "aa".unpack('b8B8') #=> ["10000110", "01100001"]
* "aaa".unpack('h2H2c') #=> ["16", "61", 97]
@@ -1285,4 +1289,5 @@ infected_str_new(ptr, len, str)
* -------+---------+-----------------------------------------
* Z | String | with trailing nulls removed
+ * | | upto first null with *
* -------+---------+-----------------------------------------
* @ | --- | skip to the offset given by the
@@ -1377,5 +1382,13 @@ pack_unpack(str, fmt)
case 'Z':
if (len > send - s) len = send - s;
- {
+ if (star) {
+ char *t = s;
+
+ while (t < send && *t) t++;
+ rb_ary_push(ary, infected_str_new(s, t - s, str));
+ if (t < send) t++;
+ s = t;
+ }
+ else {
long end = len;
char *t = s + len - 1;

Michael Neumann · Apr 25, 2004

Hi,

At Sun, 25 Apr 2004 14:34:03 +0900,
daz wrote in [ruby-talk:98298]:

The A and Z string field formats should IMO be recovered from
left to right. Doesn't the term "string" relate here to a
string element within a packed field. The packed field just
happens to be a Ruby String.

Click to expand...

Sounds nice.

[patch]

That's exactly I expected how Z behaves. Thanks!

Regards,

Michael

daz · Apr 26, 2004

Nobu patched:

--- pack.c 18 Apr 2004 23:19:45 -0000 1.69
+++ pack.c 25 Apr 2004 06:39:33 -0000

[...]

case 'Z':
if (len > send - s) len = send - s;
- {
+ if (star) {
+ char *t = s;
+
+ while (t < send && *t) t++;
+ rb_ary_push(ary, infected_str_new(s, t - s, str));
+ if (t < send) t++;
+ s = t;
+ }
+ else {

Combining that with recognition of the length specifier:

===============================

case 'Z':
{
char *t = s;

if (len > send-s) len = send-s;
while (t < s+len && *t) t++;
rb_ary_push(ary, infected_str_new(s, t-s, str));
if (t < send) t++;
s = star ? t : s+len;
}
break;

===============================

s = "abc\0def\0\0jkl\0"

s.unpack('Z2Z*Z*') #-> ["ab", "c", "def"]
s.unpack('Z6Z*Z*') #-> ["abc", "f", ""]
s.unpack('Z7Z*Z*') #-> ["abc", "", ""]
s.unpack('Z8Z*Z*') #-> ["abc", "", "jkl"]
s.unpack('Z9Z*Z*') #-> ["abc", "jkl", ""]
s.unpack('Z*Z42') #-> ["abc", "def"]

daz

nobu.nokada · Apr 26, 2004

Hi,

At Mon, 26 Apr 2004 16:19:04 +0900,
daz wrote in [ruby-talk:98364]:

Combining that with recognition of the length specifier:

===============================

case 'Z':
{
char *t = s;

if (len > send-s) len = send-s;
while (t < s+len && *t) t++;
rb_ary_push(ary, infected_str_new(s, t-s, str));
if (t < send) t++;
s = star ? t : s+len;
}
break;

===============================

I'd also considered about it, but

s = "abc\0def\0\0jkl\0"

s.unpack('Z6Z*Z*') #-> ["abc", "f", ""]

It can't round trip with Array#pack, so I discarded this plan.

daz · Apr 26, 2004

Nobu said:
daz wrote in [ruby-talk:98364]:

Combining that with recognition of the length specifier:

Click to expand...

I'd also considered about it, but

s = "abc\0def\0\0jkl\0"

s.unpack('Z6Z*Z*') #-> ["abc", "f", ""]

Click to expand...

It can't round trip with Array#pack, so I discarded this plan.

But the user has specified that the first field is
fixed-width(6) and null-terminated so:

"abc\000de" == "abc\000\000\000" ==> "abc"

Everything from "\000" to the end of the field is junk
because the user told us so by using 'Z'.

We don't need to apologise that pack didn't replace the
exact junk that was there before :-?

Round trip:

s = "abc\000def\000\000jkl\000"
zf = 'Z6Z*Z*'

s.unpack(zf) #-> ["abc", "f", ""]
s.unpack(zf).pack(zf) #-> "abc\000\000\000f\000\000"
s.unpack(zf).pack(zf).unpack(zf) #-> ["abc", "f", ""]

The fixed width consumes the added zero padding bytes so
it doesn't create bogus extra fields.

---

To me, the result below seems _not_ to do what was requested:

s.unpack('Z6Z*Z*') #-> ["abc\000de", "f", ""]

I'm probably missing a crucial point here?

daz

nobu.nokada · Apr 27, 2004

Hi,

At Tue, 27 Apr 2004 06:54:03 +0900,
daz wrote in [ruby-talk:98456]:

s = "abc\0def\0\0jkl\0"

s.unpack('Z6Z*Z*') #-> ["abc", "f", ""]

Click to expand...

It can't round trip with Array#pack, so I discarded this plan.

Click to expand...

But the user has specified that the first field is
fixed-width(6) and null-terminated so:

"abc\000de" == "abc\000\000\000" ==> "abc"

Everything from "\000" to the end of the field is junk
because the user told us so by using 'Z'.

We don't need to apologise that pack didn't replace the
exact junk that was there before :-?

Hmmm, sounds reasonable.

Yukihiro Matsumoto · May 10, 2004

Hi,

In message "Re: String#unpack and null-terminated strings"

|Hmmm, sounds reasonable.

I finally got time to consider this issue. Perl seems to work the way
Daz described in [ruby-talk:98364]. Could you commit the changes, Nobu?

matz.

nobu.nokada · May 12, 2004

Hi,

At Mon, 10 May 2004 17:53:35 +0900,
Yukihiro Matsumoto wrote in [ruby-talk:99719]:

I finally got time to consider this issue. Perl seems to work the way
Daz described in [ruby-talk:98364]. Could you commit the changes, Nobu?

What about 1.8?

Yukihiro Matsumoto · May 13, 2004

Hi.

In message "Re: String#unpack and null-terminated strings"

|At Mon, 10 May 2004 17:53:35 +0900,
|Yukihiro Matsumoto wrote in [ruby-talk:99719]:
|> I finally got time to consider this issue. Perl seems to work the way
|> Daz described in [ruby-talk:98364]. Could you commit the changes, Nobu?
|
|What about 1.8?

Hmm. Go ahead. I now think it's the only reasonable behavior for "Z"
with NUL containing strings.

matz.

daz · May 13, 2004

Yukihiro said:
Hi.

In message "Re: String#unpack and null-terminated strings"

|At Mon, 10 May 2004 17:53:35 +0900,
|Yukihiro Matsumoto wrote in [ruby-talk:99719]:
|> I finally got time to consider this issue. Perl seems to work the way
|> Daz described in [ruby-talk:98364]. Could you commit the changes, Nobu?
|
|What about 1.8?

Hmm. Go ahead. I now think it's the only reasonable behavior for "Z"
with NUL containing strings.

matz.

Thanks, Matz.

The plea below is now wasted

)

===============================================================

Hi Nobu,

Good to see your return, as always.

As the changes only affects 'Z'-types in Strings with embedded null(s),
the impact should be extremely low.

I'm trying to think of any kind of string which might contain
significant nulls but also has a null as terminator.

I've seen some where null delimits fields and double-null terminates
but that rare case might be the only one to break *iff* a
programmer had decided that the best method to use on that type of string
was unpack('Z*').

Embedded nulls are common when reading from binary files
(e.g. encoded characters) but I feel that it would never be a good idea
to strip _trailing_ nulls in that context.

Voting +1 for inclusion in 1.8, also. Much more usable

Thanks,

daz

===============================================================

Working with NON-NULL terminated strings	4	Jul 14, 2007
Are Strings automatically null terminated?	11	Jul 9, 2008
Reading null terminated strings in Java	9	Feb 4, 2009
strncpy() and null terminated strings	4	Apr 8, 2004
Null character and JavaScript strings	16	Mar 4, 2011
win32com, BSTR, and null terminated strings	5	Feb 5, 2006
ctypes and using c_char_p without NULL terminated string	0	Feb 26, 2007
Null-terminated strings with struct module?	2	Mar 6, 2004

String#unpack and null-terminated strings

Michael Neumann

Mike Stok

Michael Neumann

Daniel Berger

daz

nobu.nokada

Michael Neumann

daz

nobu.nokada

daz

nobu.nokada

Yukihiro Matsumoto

nobu.nokada

Yukihiro Matsumoto

daz

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads