Weird error using String#[]

J

Jason Mcdonald

Please check out the attached file. I am writing a script to notify me
when a few select items become available. It hits a web page then parses
the information in order to determine whether the item is available or
not.

When I parse the values out I start seeing some really weird results
when calling the String#[]. What is even weirder is that when I put
these results with something like puts "val: #{weird_val}" it also
replaces part of the string being put, "val: ".

Example:

ret = res[spos, 90]
puts "ret: #{ret}"
# ^^^^^
# Expected live result (works in baseline):
# ret: id="ProdAvailability"><span style="font-weight: bold; color:
# #000;">Availability:</span>Ou
#
# Actual live result (missing Ou on end, r in pos 0 replaced with O):
# Oet: id="ProdAvailability"><span style="font-weight: bold; color:
# #000;">Availability:</span>

If I pull the contents from the web site, it doesn't work. If I pull the
contents from a string saved in the script (denoted as baseline in the
file), it works fine.

I have been spinning my wheels for 2 days now and am pretty sure that I
am overlooking something obvious.

Anyone have any idea what is causing this?

Attachments:
http://www.ruby-forum.com/attachment/5731/availability_watcher.rb
 
J

Jason Mcdonald

Nokogiri <i>is</i> easier... (see below)

I would still like to know what exactly is causing the weird behavior in
my original post though, if anyone knows. I can understand why encoding
would result in incorrect parsing, but I don't understand why the
encoding would mess up the hard coded portion of the call to puts still.

Working Nokogiri example:

require 'rubygems'
require 'nokogiri'
require 'open-uri'


doc =
Nokogiri::HTML(open("http://www.pennstateind.com/store/PKPARK-MAG.html"))
#puts doc
ret = doc.at("div#ProdAvailability")
puts "ret: #{ret}"

# Output:
# ret: <div id="ProdAvailability">
# Outof Stock / Eta Mid January <a
href="http://www.pennstateind.com/mm5/merchant.mvc?Screen=shippingdelivery&amp;Product_Code=PKPARK-MAG"
onclick="link_popup(this,'width=500,height=600,toolbar=no,scrollbars=yes');
return false;">See Shipping Details</a><br>
# </div>
 
R

Robert Klemme

Nokogiri <i>is</i> easier... (see below)
Certainly!

I would still like to know what exactly is causing the weird behavior in
my original post though, if anyone knows. I can understand why encoding
would result in incorrect parsing, but I don't understand why the
encoding would mess up the hard coded portion of the call to puts still.

Can you provide a small program that exhibits the effect you are
seeing? It is especially important to see how you calculate indexes.

Maybe this can help to illustrate a possible scenario:

Ruby version 1.9.2
irb(main):001:0> s =3D "a=E4"
=3D> "a=E4"
irb(main):002:0> s.encoding
=3D> #<Encoding:UTF-8>
irb(main):003:0> x =3D s.dup
=3D> "a=E4"
irb(main):004:0> x.encoding
=3D> #<Encoding:UTF-8>
irb(main):005:0> x.force_encoding "BINARY"
=3D> "a\xC3\xA4"
irb(main):006:0> x.encoding
=3D> #<Encoding:ASCII-8BIT>
irb(main):007:0> x[1,1]
=3D> "\xC3"
irb(main):008:0> s[1,1]
=3D> "=E4"
irb(main):009:0>

Kind regards

robert

--=20
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/
 
J

Jason Mcdonald

Thanks, Robert. The original post has the script with both expected and
unexpected outcomes. What you show with the encoding screwing up the
offsets makes total sense.

What I'm at a loss for is why it affects the hard coded portion of the
string passed to puts:

Example:
puts "ret: #{ret}"

Output:
Oet: [part but not all of the expected string - 2 chars too short]

At this point I plan on using Nokogiri but I am really curious what is
causing what I describe above. This is a weirdness for how strings /
puts works that I'd like to understand and keep in mind going forward.

Thanks!
 
R

Robert Klemme

Thanks, Robert. The original post has the script with both expected and
unexpected outcomes.

I thought more of a small script which does not need network
connection etc. and rather works with static text.
What you show with the encoding screwing up the
offsets makes total sense.
:)

What I'm at a loss for is why it affects the hard coded portion of the
string passed to puts:

Example:
puts "ret: #{ret}"

Output:
Oet: [part but not all of the expected string - 2 chars too short]

Well, that's easy:

irb(main):014:0> s = "\rA\tB"
=> "\rA\tB"
irb(main):015:0> puts "ret: #{s}"
Aet: B
=> nil
irb(main):016:0> p "ret: #{s}"
"ret: \rA\tB"
=> "ret: \rA\tB"
irb(main):017:0> s = "\rAet: B"
=> "\rAet: B"
irb(main):018:0> puts "ret: #{s}"
Aet: B
=> nil
irb(main):019:0> p "ret: #{s}"
"ret: \rAet: B"
=> "ret: \rAet: B"

To debug you should use p and not puts.
At this point I plan on using Nokogiri but I am really curious what is
causing what I describe above. This is a weirdness for how strings /
puts works that I'd like to understand and keep in mind going forward.

It's probably rather about how your terminal works than how strings work. :)

Kind regards

robert
 
J

J-H Johansen

[Note: parts of this message were removed to make it a legal post.]

Hi there,

Normally when I see similiar behaviour it's because of "hidden" characters.

Do you have a hidden \r (0x0D, decimal 13) in the text you're reading ?


Thanks, Robert. The original post has the script with both expected and
unexpected outcomes. What you show with the encoding screwing up the
offsets makes total sense.

What I'm at a loss for is why it affects the hard coded portion of the
string passed to puts:

Example:
puts "ret: #{ret}"

Output:
Oet: [part but not all of the expected string - 2 chars too short]

At this point I plan on using Nokogiri but I am really curious what is
causing what I describe above. This is a weirdness for how strings /
puts works that I'd like to understand and keep in mind going forward.

Thanks!
 
J

Jason Mcdonald

Robert,

That example shows the same behavior in my console as you show above. So
it is the \r that is causing it, it seems. I suppose the console sees
the \r and tries to create a new line, can't, and overwrites what is
there? The reason that it is one character too short is because the \r
would count as 1.

Thanks for the example!
 
R

Robert Klemme

That example shows the same behavior in my console as you show above. So
it is the \r that is causing it, it seems. I suppose the console sees
the \r and tries to create a new line, can't, and overwrites what is
there?

No, \n is newline, \r is carriage return which simply positions the
cursor at the beginning of the line.
The reason that it is one character too short is because the \r
would count as 1.

To see what's really in the string you should use p or #inspect.
Thanks for the example!

You're welcome!

Kind regards

robert
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,771
Messages
2,569,587
Members
45,099
Latest member
AmbrosePri
Top