do your bit for my mental health - how to find the differencebetween two strings?

I

Iain Barnett

Hi,

I have a piece of code that doesn't work in a Sinatra app I've been =
writing that doesn't work.


These two forms don't work:

get '/tweets/:service/' do

mymethod( params[:service] )
...

get '/tweets/:service/' do

service =3D params[:service]
mymethod( service )
...


This works:

get '/tweets/:service/' do

mymethod( 'servicename' )

so does this:

get '/tweets/:service/' do

service =3D 'servicename'
mymethod( service )


Using the params[:service] variable then mymethod fails to do what I =
expect. If I hard-code the string that it's supposed to represent, it =
works. Every way I've examined this variable it looks exactly the same =
as a hard-coded version. Dump, inspect, to_s, length, class, equality, =
they all give the answer I expect - it's a String, and it is the same =
string that's in the URL that's been passed. I know that the method =
being called works, the only thing that doesn't is this variable. =
Nothing touches it before the method is called. I've tried via telnet =
too, no difference.

To help me keep the small grain of sanity I (hope to) have remaining, =
could anyone suggest a good way to find out what makes two strings =
different?=20

I'm currently reading stuff on using the debugger, along with some stuff =
on Rack Test, but if anyone has a useful insight I can use I'd be very, =
very grateful. Would I best running a test through the debugger, for =
example?

Regards
Iain
 
T

thunk

This is likely not what you are looking for directly, but it my give
you some ideas...

class String
def levenshtein( other, ins=2, del=2, sub=1)
#ins, del, sub are weighted costs
return nil if self.nil?
return nil if other.nil?
dm = [] #distance matrix

#Initialize first row values
dm[0] = (0..self.length).collect { | i | i * ins }
fill = [0] * (self.length - 1)

#initialize first column values
for i in 1..other.length
dm = [i * del, fill.flatten]
end

#populate matrix
for i in 1..other.length
for j in 1..self.length
#critical comparison
dm[j] =
[dm[i-1][j-1] + (self[j-1] == other[i-1] ? 0 : sub), dm
[j-1] + ins, dm[i-1][j] + del ].min
end
end

#The last value in the matrix is the Levenshtein distance betw
the strings
dm[other.length][self.length]
end

end

def ls( ar, threshold=3 )#Array must have at least 2 elements
word1, word2, nRslt, lRslt = ar.first.to_s, ar[1].to_s, 999, false
if ar.size == 2
nRslt = word1.levenshtein( word2 )
lRslt = nRslt <= threshold
elsif ar.size > 2
range = 1..ar.size - 1
range.each do | n |
word2 = ar[n]
nRslt = word1.levenshtein( word2 )
puts "word2 = " + word2.to_s + ", ls value = " +
nRslt.to_s
if nRslt <= threshold
lRslt = true
break
end
end
end

puts "word1 = " + word1.to_s + ", and word2 = " + word2.to_s + "
Result = " + nRslt.to_s + " Passed? " + lRslt.to_s
lRslt
end
 
B

Brian Candler

Iain Barnett wrote in post #968364:
I have a piece of code that doesn't work in a Sinatra app I've been
writing that doesn't work.

These two forms don't work:

get '/tweets/:service/' do

mymethod( params[:service] )
...

get '/tweets/:service/' do

service = params[:service]
mymethod( service )
...


This works:

get '/tweets/:service/' do

mymethod( 'servicename' )

so does this:

get '/tweets/:service/' do

service = 'servicename'
mymethod( service )


Using the params[:service] variable then mymethod fails to do what I
expect. If I hard-code the string that it's supposed to represent, it
works. Every way I've examined this variable it looks exactly the same
as a hard-coded version.

If you're using ruby 1.9, then there are hidden qualities to strings :-(
So I suggest you try:

STDERR.puts service.inspect, service.encoding

Apart from that, I guess it's possible that there's something else in
the program's environment which is different between your two test runs.
To minimise these differences, I suggest

service = params[:service]
service.replace('servicename') # <<<<
STDERR.puts service.inspect, service.encoding
mymethod( service )

Try running this with the marked line present, and with that line
commented out.

HTH,

Brian.
 
R

Ryan Davis

This is likely not what you are looking for directly, but it my give
you some ideas...

class String
def levenshtein( other, ins=2, del=2, sub=1)

WTF?!? Did you even read his mail?
 
T

thunk

WTF?!? Did you even read his mail?


I was thinking from the title that knowing the exact difference
between two strings could help him find the source of the problem.
The Levenstein algorithm does this. I thought it could be useful and
I had a library handy with the source in it.

Actually, I have to wonder if you read mine: IT MAY GIVE YOU SOME
IDEAS. I don't get down the that level very often so I really did
think that code might help him.

George
 
H

Hassan Schroeder

I was thinking from the title that knowing the exact difference
between two strings could help him find the source of the problem.
The Levenstein algorithm does this.

Well... your example code tells me the Levenshtein distance between
'foo' and 'food' is 2, which would seem an iffy assertion.

But I'm no mathematician :)
 
R

Ryan Davis

Actually, I have to wonder if you read mine: IT MAY GIVE YOU SOME
IDEAS.

Well... my mind is blown. Obviously, for me to call you out for not =
having read the OP's email beyond the subject line, I'd have to read =
your mail to get an understanding that your answer has=20

## ##### #### #### # # # ##### ###### # # #=20
# # # # # # # # # # # # # # # =20
# # ##### #### # # # # # # ##### # # =20
###### # # # # # # # # # # # # =20
# # # # # # # # # # # # # # # =20
# # ##### #### #### ###### #### # ###### ###### # =20
=20
=
=20
# # #### ##### # # # # # #### ##### #### ##### =
#### =20
## # # # # # # # ## # # # # # # # # # =
#=20
# # # # # # ###### # # # # # # # # # # # =
#=20
# # # # # # # # # # # # # ### # # # # # # =
#=20
# ## # # # # # # # ## # # # # # # # # =
#=20
# # #### # # # # # # #### # #### ##### =
#### =20

with what he is asking.

On top of it, according to Hassan, apparently your algorithm is wrong. =
Good job!
 
B

botp

These two forms don't work:

get '/tweets/:service/' do
=A0mymethod( params[:service] )
...

get '/tweets/:service/' do
=A0service =3D params[:service]
=A0mymethod( service )
...


1 pls show terminal/screenshots
2 provide small code that produces the error (cause i cannot reproduce
that behaviour here)

thanks and best regards -botp
 
T

thunk

Listen,

If he adopted the nearness tester for a quick true/false for equality
he would get one of two results:

1. The test finds a difference, and this case he could drill down on
the physical difference.

2. The test finds no difference, then he could use it as a work-
around to get on with app while the powers that be do what the powers
that be do sometime in the future.

I will swear that the code, which I may or may not have modified,
worked for me when I was testing it some two years ago. It stuck me
as a very useful piece of code. It is out there somewhere under the
name given in the code, I should guess.

In any case it struck me as possibly useful, and if somebody took the
time to test it then that person also thought so, or?

Clearly if he can flip some system switch or whatever, then that is
the superior solution, I was the first to respond, and was unaware
that the versions changed internal formats, but that does make sense.

In the worst case the algorithm gets down and dirty with the visible
elements of a string - or did back in earlier versions. That might
well have come in handy.

Really, Ryon, if you want to head butt, let's take it off-line, I have
said all I feel I need to say after trying to help somebody.

Sincerely,

George
 
R

Robert Klemme

If he adopted the nearness tester for a quick true/false for equality
he would get one of two results:

1. The test finds a difference, and this case he could drill down on
the physical difference.

2. The test finds no difference, then he could use it as a work-
around to get on with app while the powers that be do what the powers
that be do sometime in the future.

As Brian has correctly pointed out there are more ways that String
instances can differ than just character content. And Levenstein
Distance is zero for Strings with identical content (which the OP
indicated). So there is no additional information to be gained from
calculating this measure.

George, my 0.02 EUR on this: admitting to have made a mistake and even
apologizing does not hurt. It has happened to me, it has happened to
others around here as well - it happens all the time to everybody. On
the contrary, trying to defend your initial posting which is obviously
off the mark is unlikely to earn you merits.

Kind regards

robert
 
T

thunk

Robert,

I really do understand your message, and thank you.

The details of the example did not make sense to me, among other
things I'm not using Sinatra yet. I readily confess this. But he
did say that he was looking to:

find the difference between two strings?

not just about, but EXACTLY. I would not be defensive it I had not
been called on this in such a rude fashion. Really. Being ignored is
a strong enough signal.

Logically, it still seems that there is a "work-around" solution in
that code somewhere because he could have his own overriding equal
test on string. If they Look the same - I'd sure bet that they would
pass this test whatever the deviation in the inner workings.

.....and BTW this code DOES work EXACTLY as submitted in Ruby 1.9 just
as it did for me 3 years ago in whatever version we were using back
then.

It seemed like it COULD be a useful piece of code to look at for a few
folks in any event.

You guys have been VERY helpful to me, I am also looking seriously to
use Sinatra, this guy did really seem in a bit of distress.... and bla
bla bla

I did not expect to have somebody call me out and tell me the approach
was totally off the mark, and make the completely UNFOUNDED (and
untested) statement that the code does not work. It does, at least in
my version 1.9

Mr Ryan, it sure seems to me, went well over the top, and also was
just plain WRONG about the code if you should care to run it:


Way too much sideways motion, I won't let myself be provoked into any
more re-re-rebuttals they do no good and are just so much noise.



class String
def levenshtein( other, ins=2, del=2, sub=1)
#ins, del, sub are weighted costs
return nil if self.nil?
return nil if other.nil?
dm = [] #distance matrix

#Initialize first row values
dm[0] = (0..self.length).collect { | i | i * ins }
fill = [0] * (self.length - 1)

#initialize first column values
for i in 1..other.length
dm = [i * del, fill.flatten]
end

#populate matrix
for i in 1..other.length
for j in 1..self.length
#critical comparison
dm[j] = [dm[i-1][j-1] + (self[j-1] == other[i-1] ?
0 : sub), dm [j-1] + ins, dm[i-1][j] + del ].min
end
end

#The last value in the matrix is the Levenshtein distance betw
the strings
dm[other.length][self.length]
end

end

def ls( ar, threshold=3 )#Array must have at least 2 elements
word1, word2, nRslt, lRslt = ar.first.to_s, ar[1].to_s, 999, false
if ar.size == 2
nRslt = word1.levenshtein( word2 )
lRslt = nRslt <= threshold
elsif ar.size > 2
range = 1..ar.size - 1
range.each do | n |
word2 = ar[n]
nRslt = word1.levenshtein( word2 )
puts "word2 = " + word2.to_s + ", ls value = " +
nRslt.to_s
if nRslt <= threshold
lRslt = true
break
end
end
end

puts "word1 = " + word1.to_s + ", and word2 = " + word2.to_s +
"Result = " + nRslt.to_s + " Passed? " + lRslt.to_s
lRslt
end

puts ls( ["Davis", "Bully"] )

puts ls( ["Bully", "Bully"] )

ruby1.9 t.rb
word1 = Davis, and word2 = Bully Result = 5 Passed? false
false
word1 = Bully, and word2 = Bully Result = 0 Passed? true
true
Exit code: 0


Sincerely,

George
 
H

Hassan Schroeder

I did not expect to have somebody call me out and tell me the approach
was totally off the mark, and make the completely UNFOUNDED (and
untested) statement that the code does not work.

It was me, not Ryan, who said it didn't work, and of course I tested it
before making that comment.

Or are you saying that you believe the Levenshtein distance between
"foo" and "food" is actually 2?

I would expect it to be 1, and that's also what a Levenshtein example
implementation in another language[1] gives me.

[1] http://www.merriampark.com/ldperl.htm

But as I said, I'm no mathematician :)
 
B

Brian Candler

thunk wrote in post #96851:
The details of the example did not make sense to me, among other
things I'm not using Sinatra yet. I readily confess this. But he
did say that he was looking to:

find the difference between two strings?

not just about, but EXACTLY.

Maybe the OP could have chosen a better subject heading, but it's fairly
clear you didn't read any of the actual post.

The point is, he's calling a function foo(x) where x is either a string
"bar" read from a Sinatra parameter, or a literal string "bar". The
method foo() is behaving *differently* in these two cases, even though
the strings are apparently equal. He wrote:

"Using the params[:service] variable then mymethod fails to do what I
expect. If I hard-code the string that it's supposed to represent, it
works. Every way I've examined this variable it looks exactly the same
as a hard-coded version. Dump, inspect, to_s, length, class, equality,
they all give the answer I expect - it's a String, and it is the same
string that's in the URL that's been passed."

In other words,
foo(x) != foo(y) even though x == y

Trying to use any algorithm to measure "nearness" is not going to help
when Ruby already told him the strings are equal. If you test them for
equality character-by-character, they'll still be equal.

However there was a recent similar issue, where the significant
difference was that the strings had different encodings:
http://www.ruby-forum.com/topic/476119

The trouble is that Ruby 1.9 can say some strings are "==" when they
have the same byte content but different encodings, under some
circumstances which I won't attempt to describe here. And the hidden
encoding attribute may in turn influence the behaviour of library
functions that you call.
I would not be defensive it I had not
been called on this in such a rude fashion.

I don't condone rudeness, which has added more noise to this thread.
However you did make a thoughtless posting, apparently based on reading
only the subject line and not the content. Hence I can understand the
reaction.
 
I

Iain Barnett

Thanks for all the responses, though I was looking for the difference =
between two seemingly identical string, I will keep the levenshtein =
distance algorithm around for another future project :)


=20
service =3D params[:service]
service.replace('servicename') # <<<<
STDERR.puts service.inspect, service.encoding
mymethod( service )
=20
Try running this with the marked line present, and with that line=20
commented out.

this bore some info. I'm running Ruby 1.92 (sorry, I should have =
mentioned that) with Sinatra 1.1.0. With the replace function the =
encoding is US-ASCII, or UTF-8 if I set the `# encoding: utf-8` at the =
top of the file.

Without the replace function the encoding is ASCII-8BIT. I've been =
searching around for how to make all this work now I know this, and =
poking around in the Sinatra and Rack source, but haven't found a =
(working) answer yet. I'll post it when I do.

Thanks very much for the help.

Regards,
Iain
 
B

Brian Candler

Iain Barnett wrote in post #968894:
service = params[:service]
service.replace('servicename') # <<<<
STDERR.puts service.inspect, service.encoding
mymethod( service )

Try running this with the marked line present, and with that line
commented out.

this bore some info. I'm running Ruby 1.92 (sorry, I should have
mentioned that) with Sinatra 1.1.0. With the replace function the
encoding is US-ASCII, or UTF-8 if I set the `# encoding: utf-8` at the
top of the file.

Without the replace function the encoding is ASCII-8BIT. I've been
searching around for how to make all this work now I know this, and
poking around in the Sinatra and Rack source, but haven't found a
(working) answer yet. I'll post it when I do.

OK, well you should now be able replicate this without Sinatra:

foo = "servicename"
foo.force_encoding("ASCII-8BIT")
mymethod(foo)

Note that Sinatra gives you a parameter tagged as ASCII-8BIT, more or
less by accident (i.e. it's not documented as far as I know); it's
simply that Ruby defaults to ASCII-8BIT when reading data from a socket,
as opposed to from a file.

If you really want the gorey details, I attempted to document about 200
ruby 1.9 string behaviours at
https://github.com/candlerb/string19/blob/master/string19.rb

Anyway, you can now investigate what's going on in mymethod() which
might be affected by the encoding. If you are using sqlite3 inside that
method, you may have hit upon the bug already discovered at
http://www.ruby-forum.com/topic/476119

If you don't want to understand what's going on, you can do
service.force_encoding("UTF-8")
and cross your fingers. It might fix the problem, or it might be hiding
some other more serious problem which will explode on you at some
indeterminate time in the future.

But as far as I'm concerned, this is all total crap. I stick with ruby
1.8 for exactly this reason.
 
I

Iain Barnett

Iain Barnett wrote in post #968894:
service =3D params[:service]
service.replace('servicename') # <<<<
STDERR.puts service.inspect, service.encoding
mymethod( service )
=20
Try running this with the marked line present, and with that line
commented out.
=20
this bore some info. I'm running Ruby 1.92 (sorry, I should have
mentioned that) with Sinatra 1.1.0. With the replace function the
encoding is US-ASCII, or UTF-8 if I set the `# encoding: utf-8` at = the
top of the file.
=20
Without the replace function the encoding is ASCII-8BIT. I've been
searching around for how to make all this work now I know this, and
poking around in the Sinatra and Rack source, but haven't found a
(working) answer yet. I'll post it when I do.
=20
OK, well you should now be able replicate this without Sinatra:
=20
foo =3D "servicename"
foo.force_encoding("ASCII-8BIT")
mymethod(foo)
=20
Note that Sinatra gives you a parameter tagged as ASCII-8BIT, more or=20=
less by accident (i.e. it's not documented as far as I know); it's=20
simply that Ruby defaults to ASCII-8BIT when reading data from a = socket,=20
as opposed to from a file.
=20
If you really want the gorey details, I attempted to document about = 200=20
ruby 1.9 string behaviours at=20
https://github.com/candlerb/string19/blob/master/string19.rb
=20
Anyway, you can now investigate what's going on in mymethod() which=20
might be affected by the encoding. If you are using sqlite3 inside = that=20
method, you may have hit upon the bug already discovered at
http://www.ruby-forum.com/topic/476119
=20
If you don't want to understand what's going on, you can do
service.force_encoding("UTF-8")
and cross your fingers. It might fix the problem, or it might be = hiding=20
some other more serious problem which will explode on you at some=20
indeterminate time in the future.
=20
But as far as I'm concerned, this is all total crap. I stick with ruby=20=
1.8 for exactly this reason.
=20
--=20
Posted via http://www.ruby-forum.com/.

Yep, I am using Sqlite3 so I tried the force_encoding just after I =
replied and bingo! You're right. I'm going to look into Rack Request and =
Response and perhaps patch them to enforce it there, but for now I'm a =
lot happier. It is a stupid bug, but I like the lambda syntax in 1.9 so =
I'm not giving it up :)

I'll take a look at the string behaviours too, thanks for the link.

Regards,
Iain=
 
B

Brian Candler

Iain Barnett wrote in post #968914:
I'm going to look into Rack Request and
Response and perhaps patch them to enforce it there

I think that would be dangerous:

(1) The Rack spec already defines ASCII-8BIT for the body. It doesn't
define the return encoding from Rack::Util.parse_query, but you'd be
changing from one undocumented behaviour to another, which would break
apps which expect it how it is today.

(2) How would you choose the encoding? Using the HTTP Content-Type
header? But that refers to the body, not to query parameters like GET
/foo?param=bar

Anyway, I'd say the fundamental problem you're seeing is that a==b but
sqlite3(a) != sqlite3(b), which can be demonstrated independently of
Rack. So I think sqlite3 is really where the fix is required.
 
R

Raito Yitsushi

Hi,

I have a piece of code that doesn't work in a Sinatra app I've been writing that doesn't work.

These two forms don't work:

get '/tweets/:service/' do

  mymethod( params[:service] )
..

get '/tweets/:service/' do

  service = params[:service]
  mymethod( service )
..

This works:

get '/tweets/:service/' do

  mymethod( 'servicename' )

so does this:

get '/tweets/:service/' do

  service = 'servicename'
  mymethod( service )

Using the params[:service] variable then mymethod fails to do what I expect. If I hard-code the string that it's supposed to represent, it works. Every way I've examined this variable it looks exactly the same as a hard-coded version. Dump, inspect, to_s, length, class, equality, they all give theanswer I expect - it's a String, and it is the same string that's in the URL that's been passed. I know that the method being called works, the only thing that doesn't is this variable. Nothing touches it before the method is called. I've tried via telnet too, no difference.

To help me keep the small grain of sanity I (hope to) have remaining, could anyone suggest a good way to find out what makes two strings different?

I'm currently reading stuff on using the debugger, along with some stuff on Rack Test, but if anyone has a useful insight I can use I'd be very, very grateful. Would I best running a test through the debugger, for example?

Regards
Iain

Did you try lcs-diff?
 
P

Peter Vandenabeele

Brian Candler wrote in post #969026:
The strings are "equal", in the sense that Ruby says a==b is true. But
in ruby 1.9, some strings are more equal than others.

The specific bug he's hit is reproduced here:
http://www.ruby-forum.com/topic/476119#964668

Would it make sense than to include a method like
eql_with_encoding? on the String class?

$ irb # removed the 'ruby-1.9.2-p0>' prompts
s1 = "hello" #=> "hello"
s2 = "hello".force_encoding("US-ASCII") #=> "hello"
class String
def eql_with_encoding?(other)
other.is_a?(String) && self.eql?(other) &&
self.encoding.eql?(other.encoding)
end
end
s1.eql?(s2) #=> true
s1.eql_with_encoding?(s2) #=> false
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,055
Latest member
SlimSparkKetoACVReview

Latest Threads

Top