StringScanner question

Jon A. Lambert · Sep 18, 2005

Dear Ruby,

------------------------------------------------- StringScanner#get_byte
get_byte()
------------------------------------------------------------------------
Scans one byte and returns it. Similar to, but not the same as,
#getch.

s = StringScanner.new('ab')
s.getch # => "a"
s.getch # => "b"
s.getch # => nil

---------------------------------------------------- StringScanner#getch
getch()
------------------------------------------------------------------------
Scans one character and returns it.

s = StringScanner.new('ab')
s.get_byte # => "a"
s.get_byte # => "b"
s.get_byte # => nil

I'm using StringScanner to process network packets, and want to know
whether I should be using getch or getbyte to decode them, especially
since I have 16 and 32 byte integers and other random binary cruft.
Now I haven't noticed anything out of the ordinary using getch but the
implied threats in the RI doc have me worried.

Anyone know what the difference is, if any?

Thanks

Jon A. Lambert · Sep 18, 2005

Jon said:
Anyone know what the difference is, if any?

Dear Jon,

If you had bothered to read the source code you would have found a
bunch of slick character encoding tables in regex.c and know that
the lengths of characters in strings are dependent on the encoding
options you be running on. As long as you be using usacii then
you'll be alright, but if you start messing with kanji you'll be bitten on
the ass as StringScanner will suddenly be popping and hopping
through 1,2, or n bytes at a time with getch. So I'd recommend
using getbyte.

There are enough hints about such things dropped in the very first
chapters of the "Coding Ruby: The Canonical Coder's Guide".
Pay attention and do some research before wasting our time.

Eric Hodel · Sep 18, 2005

I'm using StringScanner to process network packets, and want to
know
whether I should be using getch or getbyte to decode them, especially
since I have 16 and 32 byte integers and other random binary
cruft. Now I haven't noticed anything out of the ordinary using
getch but the implied threats in the RI doc have me worried.
Anyone know what the difference is, if any?

From looking at strscan.c, getch seems to be able to process
multibyte characters.

Use get_byte.

Logan Capaldo · Sep 18, 2005

On Sep 17, 2005, at 10:32 PM, Jon A. Lambert wrote:

[snip docs]

I'm using StringScanner to process network packets, and want to
know
whether I should be using getch or getbyte to decode them, especially
since I have 16 and 32 byte integers and other random binary
cruft. Now I haven't noticed anything out of the ordinary using
getch but the implied threats in the RI doc have me worried.
Anyone know what the difference is, if any?
Thanks

Have you considered looking at String#unpack ? Its designed for all
that "random binary cruft"

Joe Van Dyk · Sep 19, 2005

=20
Dear Jon,
=20
If you had bothered to read the source code you would have found a
bunch of slick character encoding tables in regex.c and know that
the lengths of characters in strings are dependent on the encoding
options you be running on. As long as you be using usacii then
you'll be alright, but if you start messing with kanji you'll be bitten o= n
the ass as StringScanner will suddenly be popping and hopping
through 1,2, or n bytes at a time with getch. So I'd recommend
using getbyte.
=20
There are enough hints about such things dropped in the very first
chapters of the "Coding Ruby: The Canonical Coder's Guide".
Pay attention and do some research before wasting our time.

It's necessary now to read C source code to figure out the API for
StringScanner?

Gavin Kistner · Sep 19, 2005

It's necessary now to read C source code to figure out the API for
StringScanner?

To be clear, I believe Jon's harsh response was written in response
to himself. He was saying "Oops, I figured it out myself."

Joe Van Dyk · Sep 19, 2005

=20
To be clear, I believe Jon's harsh response was written in response
to himself. He was saying "Oops, I figured it out myself."

Yeah, I noticed that. But still, it shouldn't be necessary to read
source code to figure out API documentation.

Jon A. Lambert · Sep 19, 2005

Joe said:
It's necessary now to read C source code to figure out the API for
StringScanner?

Apparently t'was "necessary" in the practical, "Well I had to", rather than
the idealic "Well I oughta not had to".

Jon A. Lambert · Sep 19, 2005

Logan said:
Have you considered looking at String#unpack ? Its designed for all
that "random binary cruft"

Yes I am using String#unpack after gathering up all the bytes together to do
it. Unfortunately StringScanner doesn't have the unpack method, which
would be quite handy and fine addition to the class. StringScanner saves
me the hassle of writing a bunch of lexical navigation code.

StringScanner and UTF-8 in ruby 1.9	0	Sep 16, 2009
Can't solve problems! please Help	0	Sep 26, 2022
[ANN] ruby_parser 2.0.0 Released	8	Oct 23, 2008
[SUMMARY] Bytecode Compiler (#100)	0	Nov 9, 2006
[SUMMARY] Parsing JSON (#155)	12	Feb 7, 2008
[ANN] JRuby 1.1RC2 Released	1	Feb 16, 2008
Errors on REXML reading an HTML.	1	Dec 24, 2010
[ANN] el4r-1.0.4 - EmacsLisp for Ruby	0	Sep 21, 2006

StringScanner question

Jon A. Lambert

Jon A. Lambert

Eric Hodel

Logan Capaldo

Joe Van Dyk

Gavin Kistner

Joe Van Dyk

Jon A. Lambert

Jon A. Lambert

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads