String#unpack and null-terminated strings

M

Michael Neumann

Hi,

How can I unpack two or more consecutive C-strings with the
String#unpack method? Like this:

"abc\000def\000".unpack("??") # => ["abc", "def"]

Currently, this seems not to be possible. Any chance to get the
following patch applied, which implements exactly this?


Regards,

Michael


Index: pack.c
===================================================================
RCS file: /src/ruby/pack.c,v
retrieving revision 1.69
diff -r1.69 pack.c
1287a1288,1290
 
M

Mike Stok

Michael Neumann said:
Hi,

How can I unpack two or more consecutive C-strings with the
String#unpack method? Like this:

"abc\000def\000".unpack("??") # => ["abc", "def"]

Currently, this seems not to be possible. Any chance to get the
following patch applied, which implements exactly this?

You could use String#split e.g.

irb(main):001:0> "abc\000def\000".split(/\0/)
=> ["abc", "def"]

I know it's not String#unpack, but hope it helps.

Mike
 
M

Michael Neumann

Michael Neumann said:
Hi,

How can I unpack two or more consecutive C-strings with the
String#unpack method? Like this:

"abc\000def\000".unpack("??") # => ["abc", "def"]

Currently, this seems not to be possible. Any chance to get the
following patch applied, which implements exactly this?

You could use String#split e.g.

irb(main):001:0> "abc\000def\000".split(/\0/)
=> ["abc", "def"]

Sure this works. But I want to mix it with other data-types like:

"\100String\000\100".unpack("CTC") # T=null-term string

# => [64, "String", 64]

Otherwise I have to write:

str = "\100String\000\100"
a, str = str.unpack("Ca*")
b, str = str.split("\000", 2)
c, _ = str.unpack("Ca*")

p [a, b, c] # => [64, "String", 64]

Which is a bit ugly :)

Pyhtons struct.unpack has a "s" format specifier which does exactly what
I want. Perl and Ruby doesn't have this.

http://www.python.org/doc/current/lib/module-struct.html

Regards,

Michael
 
D

Daniel Berger

Michael Neumann said:
Hi,

How can I unpack two or more consecutive C-strings with the
String#unpack method? Like this:

"abc\000def\000".unpack("??") # => ["abc", "def"]

Currently, this seems not to be possible. Any chance to get the
following patch applied, which implements exactly this?


Regards,

Michael

"abc\000def\000".unpack("A3xA3") # => ["abc","def"]

Using the example you later posted...

"\100String\000\100".unpack("CA6xC") # => [64,"String",64]

Regards,

Dan
 
D

daz

Michael said:
Hi,

How can I unpack two or more consecutive C-strings with the
String#unpack method? Like this:

"abc\000def\000".unpack("??") # => ["abc", "def"]

Currently, this seems not to be possible. Any chance to get the
following patch applied, which implements exactly this?

[snip] diff -r1.69 pack.c


At the risk of being told to clear off and write my own spec.,
I think that an ambuiguity has intruded into the designers mind.

The A and Z string field formats should IMO be recovered from
left to right. Doesn't the term "string" relate here to a
string element within a packed field. The packed field just
happens to be a Ruby String.

If this is going to break code, I wish that it could happen
from 1.9.
As it is now, A and Z are behaving the way I would expect
A* and Z* to (i.e. * uses all remaining elements).

There's String#rstrip for removing spaces and nulls from the
end of a String.

Unpack is very useful for decoding structures but with the
current behaviour if a structure were to contain a null-
terminated string element it would break the flow ...
.... as Michael has highlighted.

Please, Matz.


daz
 
N

nobu.nokada

Hi,

At Sun, 25 Apr 2004 14:34:03 +0900,
daz wrote in [ruby-talk:98298]:
The A and Z string field formats should IMO be recovered from
left to right. Doesn't the term "string" relate here to a
string element within a packed field. The packed field just
happens to be a Ruby String.

Sounds nice.


Index: pack.c
===================================================================
RCS file: /cvs/ruby/src/ruby/pack.c,v
retrieving revision 1.69
diff -u -2 -p -r1.69 pack.c
--- pack.c 18 Apr 2004 23:19:45 -0000 1.69
+++ pack.c 25 Apr 2004 06:39:33 -0000
@@ -435,5 +435,5 @@ static unsigned long utf8_to_uv _((char*
* X | Back up a byte
* x | Null byte
- * Z | Same as ``A''
+ * Z | Same as ``a'', except that null is added with *
*/

@@ -524,6 +524,9 @@ pack_pack(ary, fmt)
case 'A': /* ASCII string (space padded) */
case 'Z': /* null terminated ASCII string */
- if (plen >= len)
+ if (plen >= len) {
rb_str_buf_cat(res, ptr, len);
+ if (p[-1] == '*' && type == 'Z')
+ rb_str_buf_cat(res, nul10, 1);
+ }
else {
rb_str_buf_cat(res, ptr, plen);
@@ -1174,4 +1177,5 @@ infected_str_new(ptr, len, str)
* "abc \0\0abc \0\0".unpack('A6Z6') #=> ["abc", "abc "]
* "abc \0\0".unpack('a3a3') #=> ["abc", " \000\000"]
+ * "abc \0abc \0".unpack('Z*Z*') #=> ["abc ", "abc "]
* "aa".unpack('b8B8') #=> ["10000110", "01100001"]
* "aaa".unpack('h2H2c') #=> ["16", "61", 97]
@@ -1285,4 +1289,5 @@ infected_str_new(ptr, len, str)
* -------+---------+-----------------------------------------
* Z | String | with trailing nulls removed
+ * | | upto first null with *
* -------+---------+-----------------------------------------
* @ | --- | skip to the offset given by the
@@ -1377,5 +1382,13 @@ pack_unpack(str, fmt)
case 'Z':
if (len > send - s) len = send - s;
- {
+ if (star) {
+ char *t = s;
+
+ while (t < send && *t) t++;
+ rb_ary_push(ary, infected_str_new(s, t - s, str));
+ if (t < send) t++;
+ s = t;
+ }
+ else {
long end = len;
char *t = s + len - 1;
 
M

Michael Neumann

Hi,

At Sun, 25 Apr 2004 14:34:03 +0900,
daz wrote in [ruby-talk:98298]:
The A and Z string field formats should IMO be recovered from
left to right. Doesn't the term "string" relate here to a
string element within a packed field. The packed field just
happens to be a Ruby String.

Sounds nice.

[patch]

That's exactly I expected how Z behaves. Thanks!

Regards,

Michael
 
D

daz

Nobu patched:
--- pack.c 18 Apr 2004 23:19:45 -0000 1.69
+++ pack.c 25 Apr 2004 06:39:33 -0000

[...]

case 'Z':
if (len > send - s) len = send - s;
- {
+ if (star) {
+ char *t = s;
+
+ while (t < send && *t) t++;
+ rb_ary_push(ary, infected_str_new(s, t - s, str));
+ if (t < send) t++;
+ s = t;
+ }
+ else {


Combining that with recognition of the length specifier:

===============================

case 'Z':
{
char *t = s;

if (len > send-s) len = send-s;
while (t < s+len && *t) t++;
rb_ary_push(ary, infected_str_new(s, t-s, str));
if (t < send) t++;
s = star ? t : s+len;
}
break;

===============================

s = "abc\0def\0\0jkl\0"

s.unpack('Z2Z*Z*') #-> ["ab", "c", "def"]
s.unpack('Z6Z*Z*') #-> ["abc", "f", ""]
s.unpack('Z7Z*Z*') #-> ["abc", "", ""]
s.unpack('Z8Z*Z*') #-> ["abc", "", "jkl"]
s.unpack('Z9Z*Z*') #-> ["abc", "jkl", ""]
s.unpack('Z*Z42') #-> ["abc", "def"]



daz
 
N

nobu.nokada

Hi,

At Mon, 26 Apr 2004 16:19:04 +0900,
daz wrote in [ruby-talk:98364]:
Combining that with recognition of the length specifier:

===============================

case 'Z':
{
char *t = s;

if (len > send-s) len = send-s;
while (t < s+len && *t) t++;
rb_ary_push(ary, infected_str_new(s, t-s, str));
if (t < send) t++;
s = star ? t : s+len;
}
break;

===============================

I'd also considered about it, but
s = "abc\0def\0\0jkl\0"

s.unpack('Z6Z*Z*') #-> ["abc", "f", ""]

It can't round trip with Array#pack, so I discarded this plan.
 
D

daz

Nobu said:
daz wrote in [ruby-talk:98364]:
Combining that with recognition of the length specifier:

I'd also considered about it, but
s = "abc\0def\0\0jkl\0"

s.unpack('Z6Z*Z*') #-> ["abc", "f", ""]

It can't round trip with Array#pack, so I discarded this plan.

But the user has specified that the first field is
fixed-width(6) and null-terminated so:

"abc\000de" == "abc\000\000\000" ==> "abc"

Everything from "\000" to the end of the field is junk
because the user told us so by using 'Z'.

We don't need to apologise that pack didn't replace the
exact junk that was there before :-?

Round trip:

s = "abc\000def\000\000jkl\000"
zf = 'Z6Z*Z*'

s.unpack(zf) #-> ["abc", "f", ""]
s.unpack(zf).pack(zf) #-> "abc\000\000\000f\000\000"
s.unpack(zf).pack(zf).unpack(zf) #-> ["abc", "f", ""]

The fixed width consumes the added zero padding bytes so
it doesn't create bogus extra fields.

---

To me, the result below seems _not_ to do what was requested:

s.unpack('Z6Z*Z*') #-> ["abc\000de", "f", ""]


I'm probably missing a crucial point here?


daz
 
N

nobu.nokada

Hi,

At Tue, 27 Apr 2004 06:54:03 +0900,
daz wrote in [ruby-talk:98456]:
s = "abc\0def\0\0jkl\0"

s.unpack('Z6Z*Z*') #-> ["abc", "f", ""]

It can't round trip with Array#pack, so I discarded this plan.

But the user has specified that the first field is
fixed-width(6) and null-terminated so:

"abc\000de" == "abc\000\000\000" ==> "abc"

Everything from "\000" to the end of the field is junk
because the user told us so by using 'Z'.

We don't need to apologise that pack didn't replace the
exact junk that was there before :-?

Hmmm, sounds reasonable.
 
Y

Yukihiro Matsumoto

Hi,

In message "Re: String#unpack and null-terminated strings"

|Hmmm, sounds reasonable.

I finally got time to consider this issue. Perl seems to work the way
Daz described in [ruby-talk:98364]. Could you commit the changes, Nobu?

matz.
 
N

nobu.nokada

Hi,

At Mon, 10 May 2004 17:53:35 +0900,
Yukihiro Matsumoto wrote in [ruby-talk:99719]:
I finally got time to consider this issue. Perl seems to work the way
Daz described in [ruby-talk:98364]. Could you commit the changes, Nobu?

What about 1.8?
 
Y

Yukihiro Matsumoto

Hi.

In message "Re: String#unpack and null-terminated strings"

|At Mon, 10 May 2004 17:53:35 +0900,
|Yukihiro Matsumoto wrote in [ruby-talk:99719]:
|> I finally got time to consider this issue. Perl seems to work the way
|> Daz described in [ruby-talk:98364]. Could you commit the changes, Nobu?
|
|What about 1.8?

Hmm. Go ahead. I now think it's the only reasonable behavior for "Z"
with NUL containing strings.

matz.
 
D

daz

Yukihiro said:
Hi.

In message "Re: String#unpack and null-terminated strings"

|At Mon, 10 May 2004 17:53:35 +0900,
|Yukihiro Matsumoto wrote in [ruby-talk:99719]:
|> I finally got time to consider this issue. Perl seems to work the way
|> Daz described in [ruby-talk:98364]. Could you commit the changes, Nobu?
|
|What about 1.8?

Hmm. Go ahead. I now think it's the only reasonable behavior for "Z"
with NUL containing strings.

matz.


Thanks, Matz.



The plea below is now wasted :))

===============================================================

Hi Nobu,

Good to see your return, as always.


As the changes only affects 'Z'-types in Strings with embedded null(s),
the impact should be extremely low.

I'm trying to think of any kind of string which might contain
significant nulls but also has a null as terminator.

I've seen some where null delimits fields and double-null terminates
but that rare case might be the only one to break *iff* a
programmer had decided that the best method to use on that type of string
was unpack('Z*').

Embedded nulls are common when reading from binary files
(e.g. encoded characters) but I feel that it would never be a good idea
to strip _trailing_ nulls in that context.

Voting +1 for inclusion in 1.8, also. Much more usable :)


Thanks,

daz

===============================================================
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,770
Messages
2,569,584
Members
45,075
Latest member
MakersCBDBloodSupport

Latest Threads

Top